Over the past couple weeks, roc and I have come up with a design that looks promising for web recording and deterministic replay: an rr like tool that can be integrated into Firefox and used by web and browser developers to debug content processes. Browser non-determinism originates from two sources: intra-thread and inter-thread. Intra-thread non-deterministic behaviors are non-deterministic even in the absence of actions by other threads, and inter-thread non-deterministic behaviors are those affected by interleaving execution with other threads, and which always behave the same given the same interleaving. Intra-thread non-determinism can be recorded and later replayed for each thread. These behaviors mainly originate from system calls (i/o and such), and can be instrumented at the library call level --- this means we can just use shims at library call sites in Gecko code rather than modifying the libraries themselves, and is helpful on Windows, where the library call API is stable/documented but not the system call API. Inter-thread non-determinism can be handled by first assuming the program is data race free: shared memory accesses which would otherwise race are either protected by locks or use mozilla::Atomic. If we record and later replay the order in which threads acquire locks (and, by extension, release locks and use condvars) then accesses on lock-protected memory will occur in the same order. Atomic variables can be handled either by treating reads and writes as if they were wrapped by a lock acquire/release during recording, or by treating reads as intra-thread non-determinism (record and replay the values produced). I like the latter more (less overhead), though it wouldn't work well with atomic pointer values so a mix of the two is probably best (needs experimentation). This design is broadly similar to rr, with some major differences: - This should work on all platforms and architectures supported by Gecko, without much port work required. - This will be part of Gecko itself, rather than a separate tool, which means both that developers won't need additional software to use it and that this can't be used to debug other software. - This can use multiple cores during recording and replay. - This does not preserve exact behavior. Context switches can occur at different times and data races can lead to different behavior between recording and replay. Data races are bugs in and of themselves, however, so this sort of non-determinism should be fixed regardless. There's one exception here that I know of: SharedArrayBuffer can be used by web content to introduce data races to the browser. Pages which use SharedArrayBuffer can still be recorded, but using a different technique from above which will probably use a single core for efficiency's sake. More generally, this design is flexible enough that it could diverge from rr in other ways. For example, it would be really slick if the existing debugger in the browser could be activated during replay without perturbing behaviors which the debugger can observe. This should greatly simplify implementations of a rewind debugger and/or omniscient debugger (build a database of behaviors like DOM/JS object modifications and executed scripts for querying) based on the replayer, but would mean the recording and replay executions could have different pointer values and code paths taken in some places. For now though, this bug is about building a prototype to test how well these ideas work for a basic record/replay system.
Initial WIP, attaching for posterity. This records NSPR thread/lock creation order and lock acquisition order into files, along with some system calls information for each thread. The record/replay architecture is all inside NSPR, and it seems best for now to keep it there and use NSPR for threads/locks everywhere in the browser (this patch already fixes up thread creation in the chromium message passing stuff to use NSPR) though I don't know how long that strategy will last. This patch intercepts system calls by adding new functions like PR_RecordReplay_kevent and replacing direct system calls with calls to the wrapper. I'm going to rework this strategy; on OS X system calls seem to all go through a userspace library (libsystem_kernel.dylib) whose code can be modified to redirect calls to wrapper functions. This should let system calls within complex libraries be intercepted without modifying the libraries, and moves the design closer to rr. I don't know if this will work on all platforms (or even if it will actually work on OS X) but it should simplify the prototype.
Assignee: nobody → bhackett1024
Updated WIP that modifies code in libsystem_kernel.dylib and creates trampolines so that system calls through this library can be redirected to a stub that performs the intended call and records information about the result. Right now recording seems to be working fine for all the system calls I can find running the content process on a basic website (nytimes.com) using dtruss. The code modification is pretty hacky (pattern matching disassembly of the code patterns seen in my computer's libsystem_kernel.dylib) but the general approach looks to be basically the same as other hooking tools (e.g. Detours from MSR, http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf). The next step here is to record atomic variable accesses.
Attachment #8666530 - Attachment is obsolete: true
We have a similar hooking system in xpcom/build/nsWindowsDllInterceptor.h; perhaps there may be an opportunity for code re-use.
(In reply to David Major [:dmajor] from comment #3) > We have a similar hooking system in xpcom/build/nsWindowsDllInterceptor.h; > perhaps there may be an opportunity for code re-use. Cool. Yes, CreateTrampoline() in that file is almost shockingly similar to Redirect() in this patch. To reuse this we'll need to expand things to work on other operating systems and architectures but that doesn't seem too hard.
WIP with handling for atomics. This adds a template argument to mozilla::Atomic to control how much of their behavior is preserved during replay (similar to how the existing MemoryOrdering argument controls the reordering optimizations the compiler is allowed to do wrt the atomic). For now a couple atomics in the JS engine used for interrupts are not recorded at all --- these are accessed all the time, including from jitcode, and recording and replaying executions which are interrupted by the slow script dialog isn't really something that can practically be done (there are a couple other events that will prevent recording I think, like throwing overrecursion errors, but nothing major). All other atomics record enough information to preserve the exact order of their accesses, but there is flexibility to relax that as needed.
Attachment #8668518 - Attachment is obsolete: true
Updated WIP that fixes a lot of bugs in the record/replay infrastructure. Replaying still balks early during startup; the main issue I'm working on now is how to deal with Grand Central Dispatch (GCD), an OS X API for task parallelism that is built using pthreads and some kernel syscalls that are undocumented and probably shift around between OS versions. Since GCD has its own thread pool, even if we can record its behavior it will be very hard to rewind process state like we will eventually want to do, so I think the best thing here is to intercept calls to the public interface of GCD (which is hopefully fairly stable) and emulate its behavior using our own thread pool and synchronization. I don't know if GCD is used heavily anywhere in browser code; everything I've seen has been deep inside calls to functions in other system libraries.
Attachment #8669236 - Attachment is obsolete: true
AFAIK we don't use GCD in Firefox.
This patch improves the hooking infrastructure used and is able to intercept all calls into GCD.
Attachment #8678876 - Attachment is obsolete: true
This patch records a fair amount of GCD's functionality. The next step is to rejigger things so that all the GCD work items run serially on a single thread we create, which will make it straightforward to record execution of the work items. I'm hoping this won't introduce deadlocks due to artificial contention for the work item thread, but we'll see.
Attachment #8679613 - Attachment is obsolete: true
This patch records enough of GCD's functionality that a simple "Hello world" HTML file can be loaded. Unfortunately, forcing all the GCD work items onto a single thread doesn't work out, due to new deadlocks being introduced, so recorded data for work items is still not suitable for use in replay. Plan B is now to mimic GCD more closely during replay. During recording we keep track of the maximum number of simultaneously executing GCD work items (as well as work items that are executed in place when GCD is called by our own threads), and during replay create our own pool of threads and assign work items to them to maintain the same degree of parallelism. This is more complicated but should be able to replay any GCD execution.
Attachment #8681621 - Attachment is obsolete: true
This patch should have everything that is needed to record and replay GCD execution in the browser on a simple webpage. The next step is getting back to making replay actually work.
Attachment #8681714 - Attachment is obsolete: true
This patch fixes a fair number of issues with GCD replay. Attaching this now because I'm going to take a step back and try a new approach. Up to now these patches have been trying to instrument things at as low a level as possible, first at the system call level and then at the GCD level when that library posed problems for replay. The main problem I'm running into now is the same as when everything was at the system level. It's very easy for the replay to go off the rails and start behaving differently, either in a way we can detect (different order of system calls) or just by crashing in the middle of some library. Most of these libraries are closed source and debugging these problems is extremely difficult. More problematic though is that some of these libraries shouldn't even be expected to behave the same during replay. If a library's behavior is affected by thread interleavings --- it passes information between threads, lazily initializes things, etc. --- then the calls it makes won't be consistent between executions. And without modifying the source for such libraries this problem isn't really solvable. Hooking functions like pthread_mutex_lock would help, but a lot of synchronization is done using inlined atomic operations which can't be hooked. So I think a better interface to target might be the one between code compiled into the browser and external libraries the browser is linked against. Libraries which are well behaved and operate in a thread local fashion can still be recorded/replayed at the system call level, but more complex multithreaded libraries can be isolated during recording and then have their observable effects replayed later.
Attachment #8681976 - Attachment is obsolete: true
Brian, can you describe the goal here a little better? If it's to make it easier to gecko debuggers to debug gecko, then I can understand the design. But I don't see how this could be used to create a JS record-and-reply debugger. It seems too low level. If the user wanted set a new breakpoint, it seems like it would perturb our state enough that the trace would no longer be valid. Comment 0 addresses that, but I don't quite understand what the conclusion was.
(In reply to Bill McCloskey (:billm) from comment #13) > Brian, can you describe the goal here a little better? If it's to make it > easier to gecko debuggers to debug gecko, then I can understand the design. > But I don't see how this could be used to create a JS record-and-reply > debugger. It seems too low level. If the user wanted set a new breakpoint, > it seems like it would perturb our state enough that the trace would no > longer be valid. Comment 0 addresses that, but I don't quite understand what > the conclusion was. This should be usable for people debugging Gecko, but it is primarily intended for the browser's debugger and web developers. Whether this will work with the browser's debugger as it currently exists is still speculation, so comment 0 raised the question but that's all it is for now. With a design like rr the state can't be perturbed at all during replay without diverging completely from the recording, so using the browser debugger wouldn't work without massive modifications. This design offers enough flexibility that certain state, like what code is JIT compiled, when GCs occur, how many allocations occur, and so forth should be able (again, speculative) to vary between the recording and replay, in a way that would give the debugger enough room to do its thing. For example, we can't expect to record and replay mmap and mprotect calls exactly, but during replay can keep track of which memory is mapped in which way at snapshots so that we can rewind process state in a sensible way. If the presence of the debugger can affect the existence or timing of operations that are in the recording (for example, finalizers called from the GC which close file descriptors) then we would need to modify either the information we record or Gecko itself to be able to move forward.
Another question: how would we get the browser to the same initial state as the recording started in? Would we save the raw bits in the memory mapping or would we have some kind of serialization/deserialization for the entire heap?
(In reply to Bill McCloskey (:billm) from comment #15) > Another question: how would we get the browser to the same initial state as > the recording started in? Would we save the raw bits in the memory mapping > or would we have some kind of serialization/deserialization for the entire > heap? Right now the focus is on recording as soon as the process starts. We would only record e10s content processes, so from the user's perspective the "start recording" button or whatever would open a new tab in a new process whose behavior could be replayed later (also in a new process). Starting to record in the middle of executing an existing content process would be very cool but I don't know how feasible it is. I'm interested in using a memory snapshot based system for rewinding (this isn't required though, and the record/replay mechanism is largely separate from the eventual rewind and/or omniscient debugging mechanism), so we could potentially (again, all this is speculative) start recording at any point by taking a memory snapshot (would need to know all mapped memory in the process, which might require hooking mmap/mprotect whether we are recording or not), recording until the user desires to stop, then reverting the process to the initial snapshot for replay. Replaying such a partial recording in a new process would be even harder, and may be intractable if ASLR is in use.
This patch intercepts enough system library calls so that on a simple webpage we never enter GCD without passing through a function or logic whose behavior is recorded (and will not execute at all during replay). Generally this is done by hooking the library functions, though objective C messages are handled via manual instrumentation (I don't know how or whether these message invocations can be hooked). I really like how this approach is going; everything that is hooked is called directly from browser code, these functions are all documented public APIs (GCD has a tricky issue where other system libraries enter it using undocumented private APIs), and this strategy should carry over well to Windows and other OSes.
Attachment #8682744 - Attachment is obsolete: true
This patch fixes a lot of replay issues. Replay gets much further than in the past, though still not far enough to completely replay a simple webpage load. The new approach is still going very well, with replay issues much easier to track down and fix (since all the involved code is compiled with the browser rather than linked in).
Attachment #8683933 - Attachment is obsolete: true
Thanks Brian, this is really interesting. > Hooking functions like pthread_mutex_lock would help, but a lot of synchronization is done using inlined atomic operations > which can't be hooked. OK, that's good to know. Why do we need to wrap void library functions like, e.g., CTFontManagerSetAutoActivationSetting? Why do we need CopyInstruction? We might be able to get that code from an existing library. I'm pretty sure we could extract the relevant parts of DynamioRio for example. It's not clear to me what state is now being preserved between recording and replay. We'll need a precise definition of that.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > Why do we need to wrap void library functions like, e.g., > CTFontManagerSetAutoActivationSetting? During recording, many of these void functions invoke system calls or other library calls in a way we can't precisely replay. This is mainly due to the libraries being multithreaded and performing synchronization in a way we don't record, but there's also drag due to not having source to these libraries and the ability to understand why replay broke down. During replay, we don't execute much of the code in these libraries, due to their unpredictable behavior. Even if a library API behaves predictably, many of the opaque heap structures (like CFStringRef) are replayed as 0x1 constants so the API will crash if we call it with that fake value. Right now we include information about all these void library functions in the recording, but that isn't necessary --- we could just hook them and nop them during replay but not include any information in the recording. > Why do we need CopyInstruction? We might be able to get that code from an > existing library. I'm pretty sure we could extract the relevant parts of > DynamioRio for example. CopyInstruction started as an expedient hack (it's similar to what nsWindowsDllInterceptor.h does) and then metastasized. At this point it's not much of a drag on development but in the long run it would be much better to get this functionality from a library. > It's not clear to me what state is now being preserved between recording and > replay. We'll need a precise definition of that. Yes, for sure. Right now the ideal is that the replay should preserve all logic (control flow and scalar values, but not pointer values) from the recording except for the logic that goes on inside system calls and Cocoa library calls. There are some exceptions: the replay can GC and Ion compile scripts differently (which has effects on other parts of the system, like nsISupports refcounts differing during replay), and some other library calls are emulated (e.g. localtime_r sometimes uses GCD). There's no effort to make sure we capture all Cocoa entry points, though, and this model is in flux and subject to further evolution. There are a couple main issues: - Not interacting with the system means that nothing will get painted, which we'll of course need to do for the replay to be sensible to users. A first cut might work to just record the bitmap data that is painted (maybe optimized to diff the data on subsequent paints) but I can't imagine that being a viable approach for graphics intensive pages like games. - The issue of attaching the JS debugger still looms. The GC/Ion compilation stuff above should help with this quite a bit, but there are other things like how to interact with the parent process (a separate IPC message loop, presumably) that will cause the replayed process to diverge further from the recording. Brian
This patch is able to completely replay the load of a simple website. Yay! Of course once the child process exits on its own the parent process thinks it crashed, but whatever. The next step is to get the simple graphics replaying mentioned in comment 20 to work (we already need to precisely record and replay graphics data, since some things like page thumbnails rely on it), which I think will require the parent and child processes to communicate with each other.
Attachment #8685487 - Attachment is obsolete: true
A fair amount of the previous patch is asserts added to debug replaying problems. This patch removes these.
Attachment #8688706 - Attachment is obsolete: true
Unless I'm totally misunderstanding things, the content process paints data by rendering layers and then sending updates to the compositor thread in the parent process via IPDL. There are at least a couple different ways to relay updates which the replayed process tries to perform to the actual parent process, but I think the most robust is for the replayed process to be able to communicate with the parent process via IPDL in a manner independent from the original recording. This should work well with future IPC we need to do, like having the parent process control the replay via rewinding, etc., and having the parent process communicate with debugger actors. The attached patch updates things so that during replay we create two new threads: one thread has a chromium I/O message loop which communicates with the actual parent process, and the second thread has a second message loop to process messages coming in from the parent process. These threads operate independently from the threads serving this role in the original recording (IOThreadChild and the main thread), which during replay act in lockstep with how they behaved in the recording and don't do any actual IPC. This patch is far enough along that we create an alternative to ContentChild, ReplayContentChild, which looks like it's communicating with the parent process, though as soon as the parent sends ReplayContentChild a message it forces a crash.
Attachment #8688745 - Attachment is obsolete: true
Separate IPDL threads sound good. However, it might be better to use completely different protocols instead of PContent etc since we have quite different needs.
This patch models enough of the layout IPDL in the replayed process that during replay we are able to render the graphics for a simple "Hello World!" webpage. I tried using a separate protocol from PContent, but ran into difficulties because it seemed like I would need to modify quite a lot of parent side code to tolerate interactions via multiple protocols (maybe there's a good place to refactor things, I don't know the parent side code at all really). This patch has skeletons for protocols like PContent and PBrowser that ignore most incoming messages, and allocates its own actors for layout protocols like PLayer and PTexture that are 1:1 with actors in the replayed content but interact with the actual parent process. During replay we watch for update messages from the replayed content and reflect those updates into ones for these alternate actors and send them to the parent.
Attachment #8691590 - Attachment is obsolete: true
This is a simple HTML file that uses setTimeout() to count from 1 to 10. Execution of this webpage can be completely replayed, including graphics. From here I want to use this as a testcase to get process rewinding and (afterwards) debugger attaching to work during replay. Combining the two should allow implementation of reverse breakpoints (find the last time a breakpoint executed).
Updated WIP that can record and replay the test.html above. This just fixes some bugs with the previous patch where shared memory buffers were not locked before being sent to the parent process (which will unlock them, and will get angry if the read count becomes negative) and where mach_msg IPC calls were not being recorded/replayed properly.
Attachment #8694735 - Attachment is obsolete: true
This patch is functionally the same as the last one but has been rebased to tip and cleaned up somewhat. At roc's suggestion I will break this apart into smaller patches to make it easier to follow and see how other parts of the browser are affected.
Attachment #8695315 - Attachment is obsolete: true
The record/replay infrastructure is in NSPR, so libnspr needs to be linked into more places in the build.
This has the recording/replaying infrastructure for thread/lock creation, lock ordering, atomic access ordering, thread events, function hooking, and alternate implementations of all the hooked functions. Sometime later this will be split up more.
This modifies mfbt/Atomics.h the NSPR atomic interfaces to automatically record and replay atomic accesses. In the former case this is done at a user-specifiable precision, via template parameters.
This patch has the remainder of the changes to NSPR, to keep track of ids on threads and locks and ensure that creation and uses of these are recorded and correctly replayed.
The chromium IPC code currently uses pthreads locking and threading primitives, which are not included in the recording. This patch changes things so that NSPR primitives are used instead.
Attachment #8696738 - Attachment is patch: true
This patch has changes on the parent side for interfacing with recording/replaying child processes.
This patch relaxes certain atomic and lock accesses so that they are not included in the recording. These are mostly related to GCs and JS helper thread behaviors (like Ion compilations), which are permitted to occur at different times in the recording and replay.
This patch changes the child process's behavior in a few places when recording or replaying is active, either to simplify recording/replaying or to avoid messy interactions with replay specific threads.
Most child side instrumentation of library calls is done via the hooking infrastructure in part 2, which doesn't require changing the code which calls those libraries at all. Code that interacts with libraries by sending mach messages needs to be manually instrumented, however, which this patch does. This is the bulk of the child side changes needed to get replay to work. I'm hoping that there is a way to do this so that the code sending the mach messages doesn't need to be modified, but I haven't yet looked into how mach messages work and whether this is possible.
Attachment #8696745 - Attachment is patch: true
This patch has the remaining manual child side instrumentation needed to get replay to work.
This patch has the changes needed to allow the replayed process to communicate with the actual parent process via IPDL messages and shared memory. It also has the implementation for reflecting layers/compositing messages from the child into messages which can be sent to the parent process; this stuff will be split off into another patch some time later.
This patch includes the graphics code side of the layers/compositing message reflection. When IPDL actors and shared memory regions are created, or update messages are sent, the replay IPC code is notified so that equivalent structures or messages can be created/sent to the actual parent process.
Thanks Brian! We discovered people already working on a similar project. See bug 1231802.
WIP for rewinding process state. This goes on top of the previous patches. Right now this records a series of snapshots as the program executes, one early on and (for now) later ones each time a compositor update is sent to the parent. Each snapshot after the first is a diff of heap and static memory vs. the previous snapshot --- the mach_vm_region syscall is used to find all mapped memory in the process at the first snapshot, and then mprotect and an asm.js-style signal handler are used to find regions modified before the next snapshot. The snapshots also include stacks and other state for each thread, recorded and restored with setjmp/longjmp and some inline assembly. Currently we can restore an earlier snapshot, though the main thread crashes shortly after doing so, so there's still work to do.
Rewinding in this patch works well enough that we can run to the point where a compositor update is first sent to the parent, rewind to when the initial snapshot was made, and continue running forward to the same point and rewinding ad nauseam. The main tricky bit to deal with has been that after rewinding all heap memory naively, pthread condition variables do not work right. I think that state in the process and state in the kernel are getting out of sync, but need a better understanding. For now the condition variables are allocated in a memory region whose contents are preserved when rewinding. The two main remaining steps for rewinding are (a) that we need to be able to rewind compositor state as well (right now the special replay threads for doing IPC and compositing are disabled if we want to rewind) and (b) that the parent process should be able to control when and how the child process is rewound, via a parent side developer toolbar I guess.
Attachment #8698587 - Attachment is obsolete: true
This patch enables the replay specific IPC threads to run while snapshots are taken and restored. The basic strategy is the same as with the other threads which are playing back from the recording --- save and restore the contents of their stacks and all memory they use. Since these threads are working with live file descriptors, however, there is at least the potential for things to get all messed up, but by waiting for all needed descriptors/channels to be fully initialized before the first snapshot is taken this strategy seems to work alright. (If this isn't viable in the long run there are more drastic options like tearing down the replay specific threads before taking a snapshot and rebuilding them afterwards, but I'd rather avoid this if possible.) This patch gets far enough along that we can run to the end of the page's execution, rewind to the beginning, and then cause the parent to crash when we resend it messages for creating IPDL actors it already knows about. The next step will be modifying the messages sent to the parent so that the compositor is able to maintain a coherent state even after the child process rewinds itself.
Attachment #8698921 - Attachment is obsolete: true
This patch is able to show graphics after rewinding; on the test page it runs to the end, rewinds to the beginning, then shows updates as it plays forward to the end again. This is managed by having the replay create all the graphics IPDL actors it will need up front (before the first snapshot is taken) using information gathered during the recording. When rewinding or fast forwarding, then, all the actors needed to construct a coherent state in the parent process are already in place. There still isn't anything complicated going on with reconstructing the compositor state which existed at some time when rewinding, as this page is so simple that all the graphics are contained in one or two tiles. Will wait on improving this until we start looking at more complicated pages. This patch also adds some more ignored locks and atomics so that GCs and CCs can occur at different times in the recording and replay (something I thought I did earlier but didn't actually do), so it's a good deal larger than the last patch.
Attachment #8700063 - Attachment is obsolete: true
This patch does the same stuff as the last one, but does so using a middleman process as described in comment 46. Overall I think this is a definite improvement, and will provide a lot more flexibility going forward. There is more boilerplate and overhead to marshal compositor updates from the replayed process into the middleman and then into the parent process, but the middleman is just a lightly modified Gecko content process --- it defers to the replayed process for compositor updates, but can do anything else a normal content process can do without worrying about how the process replay will be affected. I forgot to say earlier, but using the middleman also gets us to the separate protocols idea from comment 24. Instead of the replayed process having a skeleton PContentChild/PBrowserChild/etc. implementation, the middleman process has standard versions of these, and the middleman and replayed processes communicate with a specialized PReplay protocol.
Attachment #8701073 - Attachment is obsolete: true
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #48) > (In reply to Brian Hackett (:bhackett) from comment #46) > > I've started looking at the devtools JS debugger stuff and how to attach > > breakpoints in the replayed process. I think the design here will need a > > little reworking. I want to be able to use the C++ JS debugger > > (js/src/vm/Debugger.* and some associated bits) in the replayed process, > > Is the goal to use all (most? the ones that make sense? read only?) of the > devtools during replay? Because while a lot of functionality is built on the > Debugger API, just as much is built on a random assortment of other XPCOM > APIs > and web APIs themselves (such as the "inspector" that lets one view and edit > the > dom tree and the styles applied to each node directly). Point being that just > re-implementing the Debugger API is not enough to get most of the devtools. > Even > just the JS debugger itself may need more. Right now the goal is to just use the JS debugger during replay. Eventually, it would be nice to use all the devtools that make sense for a replay debugger (Everything except performance/network? In principle even these two could make sense for the replay debugger but the overhead of recording could perturb them a lot). Everything would be read only to the devtools, since changing state in an observable way can cause the replay to diverge from the recording. Accesses to XPCOM objects in the replayed process can still be supported; instead of having XPConnect objects which call into XPCOM directly, the devtools code will have proxies whose accesses are resolved by sending synchronous IPDL messages to the replayed process. The patch I'm working on now does this for things like the doc shell and DOM window, and the result is fairly transparent to the devtools code (I have been modifying the devtools code to handle foobar.QueryInterface(), which could maybe be done transparently but would be pretty gnarly under the hood). I don't know yet whether this will be enough to support the DOM inspector. The main question is whether the information the inspector is getting --- the boundary/padding for an element, the element which is found by a right click -> Inspect Element, etc. --- can be determined solely from the replayed process' document tree and layout state, without making calls into the system graphics libraries. > If there are any devtools or Debugger API related questions I can answer, > please > ask away :) Sure, thanks!
This patch adds some integration with the JS debugger. It does enough that one can run the replayed process, and when it finishes open the debugger UI, see the sources in the replayed process, set a breakpoint on a line, click on a new 'rewind' UI button, have the replayed process rewind to the last point where the breakpoint was hit, and finally have the middleman process notify the chrome process about the pause so it can update the UI appropriately. Not much else works --- when paused we don't yet provide properties on the environment or visible objects to inspect --- but I feel this suffices as a proof of concept, that we can attach the debugger and have it inspect state in the replayed process without perturbing the replay. The prototype is, for lack of a better word, done. Things I'd like to do next: - Post a design document about the project which describes its organization, technical features, and the invariants it is maintaining. Is MDN the best place for this? - Post a cleaned up, rolled up patch and a new patch series breaking it down. - Start recording/replaying more complicated pages. The robustness of the various oarts of this project is I feel the biggest unknown at this point.
The best place for a design doc is probably wiki.mozilla.org.
(In reply to Boris Zbarsky [:bz] (Vacation until Jan 4) from comment #51) > The best place for a design doc is probably wiki.mozilla.org. My understanding is that MDN is best for code-related stuff, and wiki.m.o is for stuff relating to people, teams and planning. E.g. from https://wiki.mozilla.org/MozillaWiki:About: > To summarize: MDN documents how to interact with code. The Mozilla wiki documents how to interact > with teams. SUMO documents how to interact with Mozilla products (as an end-user).
Huh. It used to be that MDN was for documentation for how people outside the Mozilla project should interact with our code (e.g. the web developer documentation) and things like https://wiki.mozilla.org/Gecko:Overview lived... well, where it's living. I wonder when that was supposed to have changed.
Thanks, I put up the design document at https://developer.mozilla.org/en-US/docs/WebReplay
Rolled up patch, rebased to tip and cleaned up in some places.
Attachment #8695312 - Attachment is obsolete: true
Attachment #8696702 - Attachment is obsolete: true
Attachment #8696730 - Attachment is obsolete: true
Attachment #8696731 - Attachment is obsolete: true
Attachment #8696733 - Attachment is obsolete: true
Attachment #8696735 - Attachment is obsolete: true
Attachment #8696738 - Attachment is obsolete: true
Attachment #8696740 - Attachment is obsolete: true
Attachment #8696742 - Attachment is obsolete: true
Attachment #8696743 - Attachment is obsolete: true
Attachment #8696745 - Attachment is obsolete: true
Attachment #8696746 - Attachment is obsolete: true
Attachment #8696750 - Attachment is obsolete: true
Attachment #8696752 - Attachment is obsolete: true
Attachment #8701885 - Attachment is obsolete: true
Attachment #8704372 - Attachment is obsolete: true
Link NSPR in more places, and perform all other build system changes (some of which don't fit cleanly into any of the remaining patches).
This has the record/replay infrastructure from the earlier patch, plus the rewind infrastructure, which is pretty closely intertwined. The hooking parts of the record/replay infrastructure have been separated into a new file.
The atomics interface changes are generally unchanged from the earlier patch.
The NSPR locking/threading changes are also very similar to those in the previous patch.
Changing over the chromium IPC code to use NSPR primitives is pretty much the same as in the earlier patch.
Chrome process changes are even more minimal now that during replay it is interacting with an almost normal content process.
The amount of unrecorded atomics/locking behavior is more expansive now that we're making sure no activity is recorded during GC/CC.
Behavior changes on the side of the recording/replaying content process vs. a normal content process are also more extensive now. A lot of this is disabling logging/etc. code during record/replay, which is easier to do than to make sure the logging does not affect the recording. It also includes some things like avoiding posting runnables during a GC, which is preventing incremental GC/CC from being used during recording/replaying and is a restriction I would like to fix.
This is pretty much unchanged from the earlier patch.
The additional instrumentation required to get replaying to work is also pretty much unchanged from the earlier patch.
This patch has the PReplay IPDL protocol and the middleman process side of its implementation, as well as other middleman specific code for avoiding loading documents / rendering graphics of its own, and for conveying graphics updates from the replaying process to the chrome process and debugger messages from the middleman's JS debugger to the replaying process.
The other side of the PReplay protocol, which lives in the replaying process. This patch conveys graphics updates to the middleman process, and debugger messages from the middleman process to the replaying process side of the JS debugger. It also has some modifications to the IPC code so that the replaying process can pause replay specific threads before taking or restoring a snapshot.
Changes to the C++ JS debugger to accommodate the replay JS debugger. Currently, all debug objects/scripts/etc. in a Debugger instance are wrappers around a JS object/script/etc. in another compartment. This patch allows the Debugger to have an associated ReplayDebugger, which stores different data in the debug objects/scripts/etc. and has alternate implementations for the various methods on them. This patch has the Debugger code to manage this dichotomy; the actual implementation of ReplayDebugger is in the next patch.
This patch has the implementation of ReplayDebugger on the middleman and replaying process sides; whenever the Debugger needs information about an object/script/etc. it calls a method on its ReplayDebugger which either gets the information from local state or calls a hook to send an IPDL message to the replaying process, which when received calls yet a separate hook to go back into ReplayDebugger code and compute the answer to the query.
As in the earlier patch, this includes spots where the replayed portion of the replaying process calls into IPC code so that a message can be sent to the parent process. The format of graphics messages in the PReplay protocol are different, however, from those in PLayerTransaction (layers, textures, etc. are referred to using indexes, rather than actors) which required some refactoring of the layer messages to avoid duplicate code; this patch also includes changes to the graphics code to handle this refactoring.
This patch adds a rewind button to the debugger client's UI, which when pressed sends a message to the debugger server living in the middleman process.
On the server side of the devtools code, this patch has changes to receive the rewind button event and pass it onto the C++ JS debugger. It also has some other changes needed for replaying, mainly for interacting with the proxies created for the docshell/window/etc. in the replaying process, which don't (yet?) support generic QI calls. This is the last patch in this series.
The middleman process design sounds like a really good idea. Thanks!
Thanks for the MDN document. It's really good. We might be able to simplify the graphics situation by switching rendering to use Skia or the cairo image backend instead of CoreGraphics on Mac. I assume the issue of asm.js's mprotect handlers interfering with WebReplay's can be fixed. A few things we need to figure out soon-ish: 1) Can this work on Windows? It might be a good idea to have someone try porting it to Windows who's not you. 2) How much work is it to extend to more browser features? 3) What's the long-term maintenance burden going to be? 4) Whether we really want direct debugging of a replayed process or we should target an omniscient approach. I haven't done anything here yet...
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #74) > Thanks for the MDN document. It's really good. > > We might be able to simplify the graphics situation by switching rendering > to use Skia or the cairo image backend instead of CoreGraphics on Mac. This would still require using CoreGraphics for things like rendering fonts, right? I don't think that intercepting the CoreGraphics routines is the main problem (CG is a well documented public API), but my biggest concern right now is the overhead of recording graphics data produced by CG. I'll attach a patch and some more info on this in a bit. > I assume the issue of asm.js's mprotect handlers interfering with > WebReplay's can be fixed. Yes. > A few things we need to figure out soon-ish: > 1) Can this work on Windows? It might be a good idea to have someone try > porting it to Windows who's not you. I don't know of anything that would prevent this from working on Windows. Recording/replaying is all based on hooks, which we already use in other places on Windows, but the rewinding infrastructure does more systemy things like crawling the virtual address space and using a signal handler. These things look like they have analogues on Windows, e.g. VirtualQuery looks functionally similar to mach_vm_region. If anyone wants to work on this that would be great, otherwise I can, though probably not until March. I'll be on PTO most of February. > 2) How much work is it to extend to more browser features? I'll post another patch in a bit, but it took a few days to get from recording/replaying the text only test page to recording/replaying the wikipedia home page, which does new things like loading images, using CSS, having a complex page layout, etc. The main feature I'm concerned about is WebGL, and I hope we can handle it by just hooking a bunch of new API functions and doing something similar to what we're doing with CoreGraphics. > 3) What's the long-term maintenance burden going to be? I think there are three categories to consider: - Within the record/replay system, the main burdens should be for handling breakages with new OS versions and for hooking new APIs being called by the browser. - The glue code for replay IPC, graphics messages, and debugger integration would need to be updated to continue working after changes to those browser components. - In other parts of the browser that don't directly interact with the record/replay mechanism, I would like to minimize changes as much as possible, which should also minimize the maintenance burden. I think things are pretty good right now, though the mach message intercepting is still pretty ugly and there are some other improvements that would be nice to make, like allowing the GC to work the same way whether recording/replaying or not.
With this patch, the main page for Wikipedia can be recorded and replayed. As noted above, while this is a pretty static page it uses a lot more browser features than the basic test page did. The changes in this patch are mostly internal reorganization/improvements in the record/replay code. Other changes include some new APIs being hooked, messages being intercepted, and several hash tables that need to behave deterministically (the handling for these hash tables is pretty low level currently, I want to fold these into the hashtable classes themselves at some point).
This patch is still pretty preliminary but illustrates what I'm hoping to with graphics data. As I understand it the tile based renderer creates a bunch of CGBitmapContexts which render graphics updates to shared memory buffers managed by IPDL textures. Right now we handle these during record/replay by keeping track of API calls which will change the rendered data, and then at points where the data is needed (e.g. before sending a graphics update to the parent) recording the contents of the graphics data buffers so they can be restored later. This is simple and it works but it produces an unacceptable amount of data. When loading Wikipedia the size of the recording is 176MB, 88% of which is graphics data. If I load Wikipedia, scroll to the bottom and then back to the top, the size of the recording is 477MB, 95% of it graphics data. This patch leverages the hooking of the CoreGraphics and other Core APIs to avoid including any of this graphics data in the recording. When the process is replaying, when these calls are performed they are recorded in a separate binary stream, and when a graphics update is sent to the middleman that binary stream is also sent to the middleman, so that it can perform the graphics calls itself and render graphics to the memory buffers it shares with the replaying process. It's possible that the middleman could render the graphics data directly to the shared memory buffers it shares with the chrome process, eliminating the need for sharing graphics buffers with the replaying process, but this would require that the replay work without ever accessing the graphics data and that is not currently the case. Currently this works with a simple 'Hello World!' page. Handling more complicated pages should only require improving this binary stream to work with other APIs that are already being hooked.
> > 2) How much work is it to extend to more browser features? > > I'll post another patch in a bit, but it took a few days to get from > recording/replaying the text only test page to recording/replaying the > wikipedia home page, which does new things like loading images, using CSS, > having a complex page layout, etc. The main feature I'm concerned about is > WebGL, and I hope we can handle it by just hooking a bunch of new API > functions and doing something similar to what we're doing with CoreGraphics. I'm also worried about the media stack, and third-party code like video codecs and the WebRTC stack that use system APIs and custom threading. (In reply to Brian Hackett (:bhackett) from comment #76) > With this patch, the main page for Wikipedia can be recorded and replayed. > As noted above, while this is a pretty static page it uses a lot more > browser features than the basic test page did. The changes in this patch > are mostly internal reorganization/improvements in the record/replay code. > Other changes include some new APIs being hooked, messages being > intercepted, and several hash tables that need to behave deterministically > (the handling for these hash tables is pretty low level currently, I want to > fold these into the hashtable classes themselves at some point). Yeah, iterating nsPtrHashKey tables was obviously going to be an issue. It might be a good idea to just eliminate that, independently of record/replay. Your graphics recording approach might be able to use Moz2D's CreateRecordingDrawTarget, which in principle does what you need.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #78) > Your graphics recording approach might be able to use Moz2D's > CreateRecordingDrawTarget, which in principle does what you need. Great, I will try doing this. Ideally all the CoreGraphics interaction would be behind the DrawTarget and other 2D abstractions, so that we don't need to hook the graphics functions at all. Instead different draw targets would be used, so that while recording the CG target is active but while replaying a recording target is active whose events are forwarded to a CG target in the middleman process. This design should work on other platforms as well.
This patch expands on recording/execution of graphics commands so that during replay of the home page for Wikipedia the entire page is rendered correctly. Next I'll try interposing DrawTargetRecording as described above to do the same thing.
Attachment #8707023 - Attachment is obsolete: true
This is a replacement for the previous graphics patch which uses DrawTargetRecording instead of hooking CoreGraphics API calls to perform drawing in the middleman instead of the replaying process. During both recording and replay a DrawTargetRecording is active; when recording the DrawTarget calls are forwarded to the normal native DrawTarget as is normally done by DrawTargetRecording, and when replaying the DrawTarget calls are simply recorded, with no underlying native DrawTarget. This allows many of the CoreGraphics hooks to be removed. With this patch, the replaying process is able to render the Wikipedia homepage. There are still a couple issues: - Font data consumes over 80% of the recording data on Wikipedia, and it would be better to have an efficient way of encoding them between recording/replay. The earlier patch just recorded the full names of any system fonts being created, and doing something similar here when possible would be good. - nsNativeThemeCocoa::DrawWidgetBackground gets the CGContext from a DrawTarget and draws things directly to it, for what looks like native widgets like checkboxes and such. These uses won't be captured by the DrawTargetRecording, though this is a more general issue with DrawTargetRecording and not specific to record/replay (AFAICT other than here DrawTargetRecording is not used on Macs, e.g. before this patch there was no way for the decoder to construct fonts for drawing on platforms other than Windows).
Attachment #8707484 - Attachment is obsolete: true
This patch records full font names when available from the CGFontRef instead of the font's tables, bringing the total size of the recording on Wikipedia down to around 22 MB (which is where it was with the earlier CoreGraphics based patch).
This patch goes on top of the previous ones and includes various fixes to improve robustness. With this patch I can record and replay a simple GMail session --- from a fresh profile, the login page is loaded, I log in, and the inbox loads and displays. The total recording size is about 40 MB, which seems reasonable and should improve if we start doing in memory recording compression. Everything about the page looks right, except for animations that stay on their initial frame (e.g. the avatar that shows up on the login screen). I don't know what to do about these yet, especially when they are based on absolute time stamps and when the debugger is active on these pages. Anyways, the main new issue this patch exposed is what to do about places where non-deterministic parts of the browser interact with deterministic parts. In particular, there are several nsISupports classes --- nsGlobalWindow, HTMLImageElement, and so forth --- that can be destroyed during GC and trigger recorded events in their destructor, and cross compartment wrappers that are destroyed during GC sweeping can trigger recorded events when they are regenerated later. There are a few ways these could be handled: A) Move the recording boundary so that recorded events are never triggered by these operations. This is conceptually the cleanest, and is just an extension of what we already do with e.g. atomic refcounts and other quantities that change during GC. It will probably require the most work, though, as some of these destructors do things like posting events to the main thread's event loop and sending IPDL messages, which will need new infrastructure to separate out the deterministic and non-deterministic portions of these other systems. B) Ensure that destruction of the data in these classes are performed consistently, using an event on the main thread that fires every second or so. During recording, objects which can trigger events have their destruction delayed until this event next fires. During replay, keep the corresponding objects alive until the same event firing, and then either destroy them or clean up their internal data, depending on whether they still have other references. I tried to do this approach, but don't like it that much. C) Force these objects to behave deterministically, by simply never destroying them. This is of course the simplest approach and is the one this patch takes. It might be viable in the long run, since recordings are expected to be relatively short lived, but it still doesn't seem ideal. Probably the best strategy is a mix of A) and C). Start by just not destroying such objects in a non-deterministic fashion, and then find some minimally invasive ways to improve Gecko so that the destruction of these objects does not interfere with the recording and can happen as it normally would.
This patch makes enough changes so that the build works on Windows. It doesn't do anything else, but since this patch is so large (mostly from code reorganization and renaming variable types to appease cl) it seems better to split off from changes with actual substance.
This patch goes on top of the previous one and improves the Windows build so that all thread/locking activity is recorded and so that the hooking infrastructure works. Right now this just hooks a few functions, the next step is to expand this to a more complete set.
FWIW, I just noticed that there's a TimeTravelDebugging branch of https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I haven't looked at their implementation and how it compares to this one.
Updated rebased and rolled up patch. Back in early February I did a fair amount more on windows, adding hooks for many of the kernel32 interfaces, but things don't work yet. From here I'm going to change tack a bit, mostly working on Mac and improving the quality of the patch and robustness of the technology, aiming towards getting a solid, interesting demo ready. I'll continue working on improving the windows side of things, though with lower priority.
Attachment #8705209 - Attachment is obsolete: true
Attachment #8707012 - Attachment is obsolete: true
Attachment #8708541 - Attachment is obsolete: true
Attachment #8708687 - Attachment is obsolete: true
Attachment #8711036 - Attachment is obsolete: true
Attachment #8711510 - Attachment is obsolete: true
Attachment #8713640 - Attachment is obsolete: true
Updated rolled up patch. This makes several improvements to the basic record/replay infrastructure --- records are compressed in memory while being written to disk, limits on the number of threads and locks and so forth that can be tracked are removed, hash tables are used instead of vectors for associating information with objects. It also fixes some things so that executions can be recorded/replayed with a --disable-debug --enable-optimize build.
Attachment #8727462 - Attachment is obsolete: true
Rolled up patch with the last couple weeks of work. This changes how locks and cvars work during replay so that instead of using the native pthreads code, file descriptor pipes are used by threads to signal each other. This avoids making special pthreads related system calls, which don't seem to work reliably on OS X after rewinding process state, and is also more efficient, as threads can signal one another when acquiring locks in the recorded order without needing to acquire a particular highly contended global lock. This also fleshes out the debugger quite a bit, so that state can be inspected while paused at a breakpoint and the UI can be used to rewind or advance across all the points where a breakpoint is hit. The next, well, step is to get the stepping handlers and their rewinding counterparts to work.
Attachment #8730218 - Attachment is obsolete: true
Rolled up patch, rebased to tip. It's been a while, so I'll update the individual patches which go into this one.
Attachment #8705210 - Attachment is obsolete: true
Attachment #8705212 - Attachment is obsolete: true
Attachment #8705213 - Attachment is obsolete: true
Attachment #8705214 - Attachment is obsolete: true
Attachment #8705215 - Attachment is obsolete: true
Attachment #8705217 - Attachment is obsolete: true
Attachment #8705219 - Attachment is obsolete: true
Attachment #8705223 - Attachment is obsolete: true
Attachment #8705225 - Attachment is obsolete: true
Attachment #8705228 - Attachment is obsolete: true
Attachment #8705230 - Attachment is obsolete: true
Attachment #8705231 - Attachment is obsolete: true
Attachment #8705236 - Attachment is obsolete: true
Attachment #8705237 - Attachment is obsolete: true
Attachment #8705244 - Attachment is obsolete: true
Attachment #8705246 - Attachment is obsolete: true
Attachment #8705256 - Attachment is obsolete: true
Attachment #8736310 - Attachment is obsolete: true
The record/replay infrastructure now uses LZ4 to compress records in memory as they are written to disk, so this patch moves the in-tree version of LZ4 from mfbt to nspr.
This patch has the system/library call redirections for Mac and (still incomplete) Windows, and the infrastructure for enabling redirections.
Attachment #8737849 - Attachment description: Part 15 - Graphics layers/shmem changes for replay IPC → Part 15: Graphics layers/shmem changes for replay IPC
Rolled up patch updated with the last couple weeks of work. This mainly overhauls how breakpoints are handled when debugging a replaying process, and gets forward and reverse stepping to work. For now, stepping is handled by setting breakpoints at specific spots instead of setting a handler for a specific frame. This should work fine except when a script is invoked recursively. Eventually the replaying process should be able to keep track of frame identity even after rewinding, which would allow setting handlers on the frames themselves and fixing this.
Attachment #8737826 - Attachment is obsolete: true
Diff on top of the previous patch with the last couple weeks of work. This has various bug fixes and robustness and efficiency improvements. The most important changes are: - PLDHashTables (along with the various classes that are based on them) all have a deterministic iteration order when recording or replaying. This allows removal of all other hashtable related instrumentation in the browser, except for one spot that uses std::set (whose code we can't modify). - Rendered graphics data can be sent both ways between the replaying process and the middleman. This allows the replaying process to access the rendered graphics when needed, and also lets the middleman leverage the snapshot mechanism to restore earlier graphics data when updating compositor state after rewinding. I've started working through recording and replaying the web-platform tests, one hundred or so at a time. Currently we can record and replay all the 2dcontext tests (about 800 tests, or 10% of the suite). Three of them give different pass/fail results from a normal execution, as we currently use the CG draw target when recording/replaying, instead of Skia (though since the backend in use is hidden behind the DrawTargetRecording and never renders anything in the replaying process, fixing this shouldn't be too hard). From here I want to go through the rest of the web-platform tests, both to fix bugs and to see what browser features we can easily support while recording/replaying. With enough work it shouldn't be terribly hard to get all features tested by web-platform to record/replay, but I want to shift away from building out the prototype to handle new features (like webgl) and towards building a, uh, minimum viable product, something that people can start using and testing. Part of that is figuring out which features we do or don't want to support, and cleanly failing to record when dealing with an unsupported feature instead of just crashing somewhere.
With this patch, all web-platform tests run by mach can be recorded and replayed, with the following exceptions/caveats: - The custom threading used when playing audio/video is not handled yet, so this patch prevents these from being activated when recording. - DOM workers are similarly not allowed to be created. While recording/replaying executions with workers shouldn't be hard to do (except when SharedArrayBuffers are in use), right now the replay debugger only supports JS which runs on the main thread, and if we record/replay executions with workers then we should be able to debug the workers. A couple other technologies aren't tested by web-platform but also need to be disallowed: - WebGL uses a different graphics pipeline from normal rendering. - asm.js creates signal handlers which interfere with the handler used while replaying. Again, per comment 114 it should be possible to get all of these to work, but in the interest of focusing on improving the quality and robustness of record/replay on 'normal' web pages and on porting this stuff to other platforms (i.e. Windows) these restrictions are in place for now.
Attachment #8747462 - Attachment is obsolete: true
Rolled up patch with all changes as of a few weeks ago. I'll attach a couple patches with new stuff shortly.
Some fit-and-finish changes to the Mac port, going on top of the previous patch. This has some bug fixes and internal improvements, but the main change is to allow triggering record/replay in a new tab from the browser's UI (Tools -> Web Developer -> Record/Replay Execution}, instead of using environment variables at the command line.
Windows port work from the last couple weeks. This mostly focuses on redirecting windows system library functions, though this is not yet far enough along to record/replay a simple web page. The main neat thing about this patch is that it can hook all functions in a given DLL to make sure that we never enter the DLL except by calling through a function in the DLL (or another system DLL) that has been manually hooked. This gives a much stronger check than in the Mac port for where the recording boundary is placed.
Some changes to the Mac port from the last few weeks. This is mainly bugfixes and some reorganization, and starts an overhaul of the snapshot mechanism, representing snapshots as old copies of pages instead of diffs between snapshots (this is simpler and takes more memory/disk space, but makes it easy to discard old snapshots and avoid using a ton of disk space in long running executions), as well as doing most snapshot work off the main thread. Most of what I've been working on the last few weeks has been on the Windows port, though, I'll attach a patch for that tomorrow.
Attachment #8757432 - Attachment is obsolete: true
Updated Mac WIP with a now-functioning overhauled snapshot mechanism. Replay forwards is much faster now, due to the combination of simpler snapshots (old page copies instead of diffs), serializing snapshots off the main thread, and (most importantly) from not taking snapshots on every single compositor update. Right now snapshots are normally taken every 30 frames when going forwards (this will need adjustment based on workload), and when rewinding we first go back and generate all the intermediate snapshots that were skipped over, so that animations are rewound without skipping any frames. This snapshot generation causes the replaying process to stop showing new frames for a few seconds, so is pretty noticeable; rather than masking over this (on long running replays we will probably want to go up to 100 or more frames between snapshots) the UI could show a "Rewinding in progress" message with a progress bar or something when this happens.
Attachment #8763175 - Attachment is obsolete: true
Updated Windows WIP. This mostly adds a lot of redirections, including some nice clean handling of COM object interfaces, and is I think getting close to being able to replay simple web pages.
Attachment #8757439 - Attachment is obsolete: true
Rolled up and combined patch (for both mac and windows), incorporating the last couple of weeks of work. Windows replaying continues to slowly progress, and this also has some nice fixes for Mac like removing all the manual instrumentation of objective C messages (the core objc_msgSend routine is hooked instead) and a general cleanup and C++ification of the public API.
Attachment #8737828 - Attachment is obsolete: true
Attachment #8737830 - Attachment is obsolete: true
Attachment #8737832 - Attachment is obsolete: true
Attachment #8737833 - Attachment is obsolete: true
Attachment #8737835 - Attachment is obsolete: true
Attachment #8737836 - Attachment is obsolete: true
Attachment #8737837 - Attachment is obsolete: true
Attachment #8737838 - Attachment is obsolete: true
Attachment #8737839 - Attachment is obsolete: true
Attachment #8737841 - Attachment is obsolete: true
Attachment #8737842 - Attachment is obsolete: true
Attachment #8737843 - Attachment is obsolete: true
Attachment #8737844 - Attachment is obsolete: true
Attachment #8737846 - Attachment is obsolete: true
Attachment #8737847 - Attachment is obsolete: true
Attachment #8737848 - Attachment is obsolete: true
Attachment #8737849 - Attachment is obsolete: true
Attachment #8737850 - Attachment is obsolete: true
Attachment #8737851 - Attachment is obsolete: true
Attachment #8737852 - Attachment is obsolete: true
Attachment #8767746 - Attachment is obsolete: true
From now on I'm going to use a bitbucket repo for keeping track of the most recent series of patches for this bug. The link is here: https://bitbucket.org/bhackett1024/mozilla-web-replay/. I've also linked this from the web replay design doc in the URL field of this bug (and would link the repo directly there too if the URL field supported multiple urls). Anyways, the past few weeks I've mostly been experimenting with different ways of handling objects whose destructors have recorded events and are triggered by GC/CC. Up to now these objects have all just been permanently leaked (except for some nsPipe-related streams whose destructors *must* run for the browser to work, and which required some ugly workarounds), which is gross and wasteful and requires some weird instrumentation sprinkled around the browser. I tried making the destructors for these specific objects run at deterministic points, but while that improves things a bit the instrumentation required is even worse. The new plan is a slight redesign of the boundary between the deterministic and non-deterministic parts of the browser. Previously, the GC was non-deterministic, and since references on DOM/COM/etc. objects are released via the GC the destructors of these objects were also non-deterministic. Now, reference releases performed by the GC are deterministic. During recording we keep track of the points when deferred releases are triggered, and schedule them so that they occur at a specific point in the recording which we can find later (rather than wherever the GC itself happens). During replay, GC finalization does not actually release references, but rather the references are released at the same point they were released in the recording. (This design means that during replay some GC objects can hold dangling pointers to already-finalized objects, but since the object will have already been finalized at this point in the recording it never be used again in the replay.) Since non-determinism has been isolated to the GC, objects should be destroyed at the same points during recording and replay and no instrumentation in individual classes is needed.
The latest changesets on the bitbucket repo have some new logic for handling a kind of tricky issue with the debugger. Even when inspecting the JS state of a page, it's pretty easy to get the debugger in the replaying process to try to access parts of the system, e.g. accessing the 'font' property of a 2d canvas context will cause objective C messages to be sent to the system. We can't execute these calls/messages in the replaying process, as they were never performed during the recording. Previously these calls would kill the replaying process, and we were trying to avoid them using some (ad hoc, woefully incomplete) tests. Now, whenever a debugging IPDL message is received that might hit one of these calls/messages, we first take a snapshot. If a call/message is hit later, the snapshot is restored, the IPDL message is responded to with an indication of the failure, and the middleman process (where the debugger server lives) can respond to it appropriately.
Rolled up patch for an initial landing. Only the Mac port works here, and testing is still pretty limited --- though there are (just one right now, soon to be more) automated tests. Structurally the code is in good shape, and I don't see it changing much anytime soon, so landing code now will allow people to try things out, allow fuzzing, and allow integration into the Devtools. Tomorrow I'll post broken up parts of this patch for review, and next I'll mainly be working on finishing the Windows port, maybe the Linux port, and bugfixing. Some information about the overall patch: New code: 32k lines of code in new record/replay files. 10k of this is the udis86 library, most of which is static tables (should have a small compiled code size). JS changes: 1767 insertions, 819 deletions. Most of this is in the Debugger. non-JS changes: 2023 insertions, 608 deletions
The core record/replay/rewind code and public API. The public API is in NSPR, while the implementation is in XUL (toolkit/recordreplay). An alternative here would be to put the API in mfbt, but that is problematic for C code which needs to use the API (this could be worked around).
Add the BSD licensed libudis86 library. Before this lands I'll file a dependent bug for adding this to about:license.
The redirection infrastructure, along with redirections for Mac and Windows.
Initialize and teardown record/replay state. Replaying processes don't exit normally, but rather kill themselves after receiving a Terminate message in the PReplay protocol (in a later patch).
The crash reporter doesn't work correctly while replaying, so this patch disables it while recording/replaying. (This should probably be fixed relatively soon.)
The ordering of atomic operations (mozilla::Atomic or PR_ATOMIC_*) is recorded by default, but this can be avoided (with no associated recording overhead) by using Atomic<AtomicRecording::PreserveNothing> or PR_ATOMIC_*_NO_RECORD.
Change the atomics unit tests to not record/replay activity. Otherwise there are link errors I think.
The chromium windows message pump uses InterlockedExchange for performing some atomic operations, which can't be redirected (it's a compiler intrinsic or something). These operations need to be recorded for consistent replay, so this patch changes these over to PR_ATOMIC.
Allow atomic refcounting classes (RefCounted, AtomicRefCounted) to specify whether to record refcount activity.
Allow xpcom mutex and monitor classes to specify whether their behavior should be recorded.
Don't record refcount adds/releases in nsISupports or nsStringBuffer objects. This is important for avoiding overhead for these classes when not recording/replaying, for reducing recording size, and for avoiding issues where references are directly released from GC finalizers (though there's a separate mechanism for that in a later patch and this might not be an issue anymore).
Don't record some atomic refcounts and other quantities related to graphics surfaces. Because drawing commands are handled in a special way when recording/replaying (see later patches) these quantities might not match up between the two executions.
Lots of JS internal activity does not match up between recording/replay, particularly GC and helper thread behavior and how particular scripts are compiled. This patch avoids recording atomics related to these.
Similarly, use of JS mutexes might not match up between recording/replay. There aren't any JS mutexes whose activity needs to be recorded, so this patch turns off recording for all of them.
Don't record an atomic in the replace-malloc library, to avoid link errors against NSPR.
I think the chaos mode counter can be used directly from GC callbacks, but in any case its behavior doesn't need to be recorded for consistent replay.
Don't record some atomic quantities that NSPR uses internally for condvars and monitors. These quantities aren't related to ensuring consistent record/replay, and if the condvar/monitor is not itself recorded then we don't want to record this ancillary activity either.
The profiler pseudo stack refcounts can be modified while under the GC I think.
Deadlock detectors have an internal lock, and since the detectors can be used with locks that aren't recorded, the internal lock shouldn't be recorded either.
Avoid recording some debugging/statistics atomics, both because doing so isn't necessary and because some of these can be used under the GC I think.
Avoid recording a couple of threading atomics which aren't necessary for consistent record/replay and are accessed under the GC I think.
Avoid recording a monitor which is accessed under the GC.
Incremental GC and finalization aren't yet supported while recording/replaying, mostly because they can happen at different points between the executions and rely on posting runnables for continuing work. Creating and posting runnables requires interacting with parts of the browser whose behavior *is* recorded and replayed exactly. Also, this patch fixes some ways where we could end up performing an incremental GC even if JS::IsIncrementalGCEnabled returns false.
The GC and helper threads keep track of timing for statistics. This patch just causes all these times to be zero when recording/replaying, though an alternative I guess would be to pass through events while getting the current time so that these calls don't interact with the recording.
There are several places where runnables are dispatched from within DOM GC callbacks for continuing GC/CC activity. Since these runnables can't be dispatched from within the GC (for the same reason as incremental GC is disabled) this patch avoids this dispatching when recording or replaying.
Compacting GC can write to a lot of memory and cause large snapshots to be generated while replaying, so this patch disables it in replaying processes. Replaying processes also use raw non-nursery GC pointers as cross-process identifiers (a hack which should be fixed).
PluginUtilsOSX::SetProcessName has a bug where it doesn't check the return value of a CFBundleGetFunctionPointerForName call early enough. When recording or replaying this call always returns null (this is for simplicity, as I haven't seen anywhere we depend on this call returning non-null for important functionality).
The image loader has an array of requests which is sorted by pointer address. Since pointer values can differ between recording and replay, doing this sorting while recording/replaying can cause the replay to diverge from the recording. This patch just doesn't sort the array when recording/replaying, so that requests appear in the array in the order they were inserted. I haven't seen this pattern anywhere else, so haven't tried to generalize it as was done for hash tables.
Finalization witnesses can trigger recorded events / behavior changes from within the GC, so this patch disables them. AFAICT this class isn't used for anything critical (mainly for tests?), but if we do need to preserve its behavior we could do something similar to what happens for nsISupports references released from GC finalizers (see later patch), in which case during replay the witness would be activated at the same point as during the recording (and at a time that may be long after, or even before, the time when the replaying process GC has gotten around to finalizing the associated object.)
Native drawing commands issued directly to the underlying CGContextRef in a DrawTarget are not supported by DrawTargetRecording. Since during replay all drawing is done through a DrawTargetRecording and there is no CGContextRef to speak of, it seems best to just disallow native drawing while recording or replaying.
The replaying process debugger needs scripts to be created in a consistent order, so for simplicity this patch disables lazy and off thread parsing while recording or replaying. Off thread parses also dispatch runnables when they finish, and since off thread parsing happens on JS helper threads they can finish at different points between recording and replay.
The GC nursery collection callback can add information about the GC to some timeline consumer. Doing so can trigger recorded events, so this is disabled when recording or replaying.
Disable runnables which are generated to invoke the debugger after a GC is performed, when recording or replaying.
Since most refcounts are not recorded they might not match up between recording and replay, so this patch disables refcount tracing while recording or replaying.
Disable the background hang monitor while recording or replaying. As for the crash reporter, the hang monitor won't be able to monitor the process state and behave appropriately while replaying an execution.
Disable telemetry while recording or replaying. Telemetry measurements can be added while under the GC.
Media elements are not yet supported while recording or replaying.
DOM workers are not supported yet while recording or replaying.
Accelerated canvases are not supported yet while recording or replaying.
JS signal handlers are not supported yet while recording or replaying.
The slow script dialog is not supported while recording or replaying.
Even though the GC's behavior might differ between recording and replay, other parts of the system should behave consistently. This patch ensures this consistent behavior in three ways: making sure that deferred finalization of GC object references happens at consistent points, making sure that the same set of references is released between recording and replay, and making sure that weak pointers (i.e. from nsWrapperCache) behave the same between recording and replay.
The mMutants set in ShadowLayerForwarder is an std::set whose iteration order may differ between recording and replay. This patch manually instruments uses of this member to ensure consistent behavior, though it's pretty ugly. Alternatives would be to just use a PLDHashTable (which always gives consistent behavior, see later patches) or making a wrapper class like mozilla::set that wraps an std::set and behaves consistently while recording/replaying.
Comment on attachment 8790771 [details] [diff] [review] Part 5a - Disable incremental GC when recording or replaying. Could the parts of this patch that don't involve record/replay be extracted into another patch in a new bug? This sounds really important for Servo, where we currently have incremental GC disabled.
mach_msg is used for both inter-thread and inter-process communication. This patch manually records/replays some calls to this method that are used for IPC, though it might be better to redirect mach_msg itself and intercept all messages that are sent/received between threads as well.
(In reply to Josh Matthews [:jdm] from comment #169) > Comment on attachment 8790771 [details] [diff] [review] > Part 5a - Disable incremental GC when recording or replaying. > > Could the parts of this patch that don't involve record/replay be extracted > into another patch in a new bug? This sounds really important for Servo, > where we currently have incremental GC disabled. Sure, I'll file a new bug once I finish attaching patches here (still a ways to go, alas).
Additional JS instrumentation to mark regions where recorded events should not occur (GC, helper threads, interrupt callback) and where the recording has been invalidated (overrecursion, forced return from the interrupt callback).
Similar to part 8a, manually ensure that hash table iteration behaves consistently when operating on the transferableObjects GCHashSet during structured cloning.
The random number seed generation incorporates the environment and its pointer address. Since this can address vary between recording and replay, don't use the environment in such cases.
Use the record/replay hash table iteration ordering API internally within the PLD and PL hashtable implementations, to ensure that these tables always behave consistently when iterated over by callers.
Graphics code uses locks in shared memory to coordinate management of graphics data. Since the memory can be written by another process, its contents will vary between recording and replay. This patch instruments the classes managing these locks so that the values read while recording will be reproduced during replay.
This patch has the PReplay IPDL protocol (the only protocol by which a middleman and replaying process can communicate) and the parent (middleman) side of the implementation.
Middleman processes have a special PCompositorBridgeChild, MiddlemanCompositorChild, which is used to relay graphics updates from the replaying process to the chrome process. This patch includes external changes necessary to make sure that we don't create a normal CompositorBridgeChild or try to use it to send updates from the middleman's content itself (an about:blank document) to the chrome process.
The middleman process must shut down the replaying process when it itself is being shut down.
The middleman and replaying processes both send messages to each other off the main thread. Some of these messages are synchronous and high priority (to avoid assertions related to normal priority parent->child sync messages) and this runs into an assertion failure in MessageChannel::DispatchSyncMessage. This patch relaxes the assertion.
This patch has the child (replaying process) side of the PReplay protocol.
SharedMemory has an associated memory reporter which is normally created lazily with the first SharedMemory itself. Since replaying processes allocate special shared memory we don't want those memories to interfere with creation of this reporter. This patch adds an API so that the reporter can be created ahead of time at a specific point during both recording and replay.
During replay, the shared memory subsystem can communicate with two other pids. First, there is the original pid which it communicated with while recording, which no longer exists and whose system call results are all replayed from the recording. Second, there is the middleman process' pid which it communicates with for real (thread events are passed through to the system). Communication with these pids happens on different threads, but some state is shared between the two. This patch manages this pretty tricky situation, and also adds an interrupt message so that snapshots can be taken while the thread communicating with the middleman is idle.
This patch adds interrupt logic similar to that used for the shared memory thread, so that the replay specific IPC message pump and message loop threads can go idle and allow a snapshot to be taken.
If it is unable to send an entire message, the IPC message pump can send only part of the message before returning to the outer loop. If we take a snapshot then and restore to it later then we end up sending a torn up message fragment to the middleman, which is bad. This patch avoids returning to the outer loop by busy waiting until the entire outgoing message is sent.
Since JS helper threads never block on a recorded resource, they behave similarly to replay specific threads and need to a special mechanism for being interrupted and going idle so that a snapshot can be taken.
This patch makes sure that communication channels are set up with the middleman process when initializing the replaying process.
Replay specific threads should be invisible to the sampling profiler.
Changes to the C++ JS debugger interface to support using a replay debugger. This includes changes to the debugger object guts (a new REPLAYING slot, different possibilities for the private pointer), changes to the debugger function macros to allow easily forwarding calls to the replay debugger, changes to the handling of scripts to allow for replay-agnostic calls operating on script contents, and refactoring to avoid duplicated logic in the replay debugger. I'm sorry about the size of this patch, unfortunately it was pretty hard to completely separate everything going on here.
This patch has the replay debugger logic, which consists of an object attached to the JS debugger in a C++ process for sending queries to the replaying process state and corresponding logic for answering those queries in the replaying process itself.
This patch has the graphics code using the interfaces in ReplayChild.h to notify the recording/replaying process about IPDL actors and shared memories that are created and compositor updates that are performed, so that the replaying process can operate on these updates and forward them to the middleman process.
This patch has some refactoring to avoid duplicated code in DrawTargetRecordReplay and MiddlemanTranslator.
DrawTargetRecording::GetBackendType() looks wrong to me. Maybe I'm misunderstanding what the backend type here means, but there are a number of places where code checks the backend type of a draw target before static_cast'ing it to the corresponding class (e.g. BorrowedCGContext::BorrowCGContextFromDrawTarget).
The MiddlemanTranslator needs to control the creation of the draw targets that are associated with it, so this patch calls CreateDrawTarget on it directly. I think this is functionally the same as before for PrintTranslator, though maybe it would be better to have a Translator::CreateSimilarDrawTarget?
This patch adds a NativeFontResourceMac class, so that font resources can be created and transported with Mac draw targets and so that DrawTargetRecording can be used on Macs.
Add a RECORDREPLAY graphics backend (which unfortunately has a similar name to the RECORDING backend, despite the two being very different things).
Add a DrawTargetRecordReplay draw target. When recording this wraps a native draw target and passes through thread events whenever calling into it (so that drawing commands are performed without affecting the recording at all). When replaying this does not wrap a draw target at all but reproduces information captured during the recording, and requests rendered data from the middleman when it needs to be flushed.
Have the gfx::Factory create DrawTargetRecordReplay draw targets when recording or replaying, similar to how it already creates DrawTargetRecording targets when necessary.
A large portion (75%+) of recording files can be due to calling into the system to fill in the data associated with fonts. It is a lot more efficient to just record the final generated font data, so this patch manually instruments the font data generation calls to perform this optimization.
Client side changes to the Devtools interface for recording/replaying. This adds new 'Record Execution' and 'Replay Execution' menu items and reverse playing/stepping buttons in the debugger UI. These are gated on the new devtools.recordreplay.enabled being set to true. This pref is off by default, and for now only exists at all on Mac nightly builds.
Add goo so that calling gBrowser.addTab with the right arguments (as the previous patch does) will cause a new recording or middleman process to be spawned and associated with the tab.
Changes to the server side (middleman process) Devtools code for interacting with a replaying debugger.
Last patch!! This adds a basic record/replay mochitest, though I want to write more tests soon.
I filed bug 1302523 for the incremental GC issue. Here is an updated patch which goes on top of the one in that bug.
Attachment #8790771 - Attachment is obsolete: true
Attachment #8790736 - Flags: review?(jwalden+bmo)
Attachment #8790738 - Flags: review?(jwalden+bmo)
Attachment #8790745 - Flags: review?(ehsan)
Attachment #8790747 - Flags: review?(nfroyd)
Attachment #8790748 - Flags: review?(nfroyd)
Attachment #8790751 - Flags: review?(jdemooij)
Attachment #8790754 - Flags: review?(nfitzgerald)
Attachment #8790755 - Flags: review?(n.nethercote)
Attachment #8790756 - Flags: review?(nfroyd)
Attachment #8790765 - Flags: review?(nfitzgerald)
Attachment #8790750 - Flags: review?(bas) → review+
Attachment #8790767 - Flags: review?(nfroyd)
Attachment #8790768 - Flags: review?(nfroyd)
Attachment #8790769 - Flags: review?(nfroyd)
Attachment #8790770 - Flags: review?(nfroyd)
Attachment #8790772 - Flags: review?(sphink)
Attachment #8790774 - Flags: review?(continuation)
Attachment #8790775 - Flags: review?(jcoppeard)
Attachment #8790781 - Flags: review?(b56girard)
Attachment #8790786 - Flags: review?(matt.woodrow)
Attachment #8790781 - Flags: review?(b56girard) → review+
Attachment #8790788 - Flags: review?(nfroyd)
Attachment #8790794 - Flags: review?(jdemooij)
Attachment #8790796 - Flags: review?(continuation)