Closed Bug 1207696 Opened 9 years ago Closed 6 years ago

Web replay initial landing

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox44 --- affected

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

()

Details

(Whiteboard: leave-open)

Attachments

(81 files, 118 obsolete files)

365.40 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.04 KB, patch
billm
: review+
Details | Diff | Splinter Review
3.50 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
5.21 KB, patch
billm
: review+
Details | Diff | Splinter Review
2.16 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
10.19 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.83 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
3.57 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
22.59 KB, patch
jandem
: review+
Details | Diff | Splinter Review
875 bytes, patch
fitzgen
: review+
Details | Diff | Splinter Review
1.26 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
1.44 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.18 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
858 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
2.39 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.92 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.15 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
11.32 KB, patch
sfink
: review+
Details | Diff | Splinter Review
2.74 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.18 KB, patch
jonco
: review+
Details | Diff | Splinter Review
949 bytes, patch
BenWa
: review+
Details | Diff | Splinter Review
2.92 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
1.63 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.15 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
2.17 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.91 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.01 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
991 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
955 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
4.72 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.17 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.24 KB, patch
dvander
: review+
Details | Diff | Splinter Review
1.25 KB, patch
luke
: review+
Details | Diff | Splinter Review
2.03 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
5.48 KB, patch
billm
: review+
Details | Diff | Splinter Review
5.87 KB, patch
jandem
: review+
Details | Diff | Splinter Review
4.82 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
10.13 KB, patch
nical
: review+
Details | Diff | Splinter Review
9.41 KB, patch
nical
: review+
Details | Diff | Splinter Review
1.01 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.60 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.93 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
2.94 KB, patch
billm
: review+
Details | Diff | Splinter Review
2.85 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
3.69 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.12 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
219.10 KB, patch
Details | Diff | Splinter Review
137.62 KB, patch
Details | Diff | Splinter Review
14.80 KB, patch
nical
: review+
Details | Diff | Splinter Review
17.85 KB, patch
bas.schouten
: review-
Details | Diff | Splinter Review
1.08 KB, patch
bas.schouten
: review-
Details | Diff | Splinter Review
1.13 KB, patch
bas.schouten
: review-
Details | Diff | Splinter Review
6.57 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
3.59 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
5.51 KB, patch
bas.schouten
: review-
Details | Diff | Splinter Review
3.04 KB, patch
bas.schouten
: review-
Details | Diff | Splinter Review
21.61 KB, patch
Details | Diff | Splinter Review
15.70 KB, patch
jimb
: review+
Details | Diff | Splinter Review
3.08 KB, patch
Details | Diff | Splinter Review
1.84 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
2.81 KB, patch
billm
: review+
Details | Diff | Splinter Review
19.86 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
5.34 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
7.05 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
1.82 MB, patch
Details | Diff | Splinter Review
32.37 KB, patch
billm
: review+
froydnj
: review+
Details | Diff | Splinter Review
21.30 KB, patch
Details | Diff | Splinter Review
125.45 KB, patch
billm
: review+
Details | Diff | Splinter Review
84.62 KB, patch
billm
: review+
Details | Diff | Splinter Review
73.83 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
254.65 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
27.31 KB, patch
smaug
: review+
Details | Diff | Splinter Review
39.18 KB, patch
billm
: review+
Details | Diff | Splinter Review
41.64 KB, patch
nical
: review+
Details | Diff | Splinter Review
16.04 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
22.54 KB, patch
billm
: review+
Details | Diff | Splinter Review
7.59 KB, patch
billm
: review+
Details | Diff | Splinter Review
3.87 KB, patch
billm
: review+
Details | Diff | Splinter Review
16.99 KB, patch
nical
: review+
Details | Diff | Splinter Review
11.28 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
22.17 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
Over the past couple weeks, roc and I have come up with a design that looks promising for web recording and deterministic replay: an rr like tool that can be integrated into Firefox and used by web and browser developers to debug content processes. Browser non-determinism originates from two sources: intra-thread and inter-thread. Intra-thread non-deterministic behaviors are non-deterministic even in the absence of actions by other threads, and inter-thread non-deterministic behaviors are those affected by interleaving execution with other threads, and which always behave the same given the same interleaving. Intra-thread non-determinism can be recorded and later replayed for each thread. These behaviors mainly originate from system calls (i/o and such), and can be instrumented at the library call level --- this means we can just use shims at library call sites in Gecko code rather than modifying the libraries themselves, and is helpful on Windows, where the library call API is stable/documented but not the system call API. Inter-thread non-determinism can be handled by first assuming the program is data race free: shared memory accesses which would otherwise race are either protected by locks or use mozilla::Atomic. If we record and later replay the order in which threads acquire locks (and, by extension, release locks and use condvars) then accesses on lock-protected memory will occur in the same order. Atomic variables can be handled either by treating reads and writes as if they were wrapped by a lock acquire/release during recording, or by treating reads as intra-thread non-determinism (record and replay the values produced). I like the latter more (less overhead), though it wouldn't work well with atomic pointer values so a mix of the two is probably best (needs experimentation). This design is broadly similar to rr, with some major differences: - This should work on all platforms and architectures supported by Gecko, without much port work required. - This will be part of Gecko itself, rather than a separate tool, which means both that developers won't need additional software to use it and that this can't be used to debug other software. - This can use multiple cores during recording and replay. - This does not preserve exact behavior. Context switches can occur at different times and data races can lead to different behavior between recording and replay. Data races are bugs in and of themselves, however, so this sort of non-determinism should be fixed regardless. There's one exception here that I know of: SharedArrayBuffer can be used by web content to introduce data races to the browser. Pages which use SharedArrayBuffer can still be recorded, but using a different technique from above which will probably use a single core for efficiency's sake. More generally, this design is flexible enough that it could diverge from rr in other ways. For example, it would be really slick if the existing debugger in the browser could be activated during replay without perturbing behaviors which the debugger can observe. This should greatly simplify implementations of a rewind debugger and/or omniscient debugger (build a database of behaviors like DOM/JS object modifications and executed scripts for querying) based on the replayer, but would mean the recording and replay executions could have different pointer values and code paths taken in some places. For now though, this bug is about building a prototype to test how well these ideas work for a basic record/replay system.
Attached patch WIP (obsolete) — Splinter Review
Initial WIP, attaching for posterity. This records NSPR thread/lock creation order and lock acquisition order into files, along with some system calls information for each thread. The record/replay architecture is all inside NSPR, and it seems best for now to keep it there and use NSPR for threads/locks everywhere in the browser (this patch already fixes up thread creation in the chromium message passing stuff to use NSPR) though I don't know how long that strategy will last. This patch intercepts system calls by adding new functions like PR_RecordReplay_kevent and replacing direct system calls with calls to the wrapper. I'm going to rework this strategy; on OS X system calls seem to all go through a userspace library (libsystem_kernel.dylib) whose code can be modified to redirect calls to wrapper functions. This should let system calls within complex libraries be intercepted without modifying the libraries, and moves the design closer to rr. I don't know if this will work on all platforms (or even if it will actually work on OS X) but it should simplify the prototype.
Assignee: nobody → bhackett1024
Attached patch WIP (obsolete) — Splinter Review
Updated WIP that modifies code in libsystem_kernel.dylib and creates trampolines so that system calls through this library can be redirected to a stub that performs the intended call and records information about the result. Right now recording seems to be working fine for all the system calls I can find running the content process on a basic website (nytimes.com) using dtruss. The code modification is pretty hacky (pattern matching disassembly of the code patterns seen in my computer's libsystem_kernel.dylib) but the general approach looks to be basically the same as other hooking tools (e.g. Detours from MSR, http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf). The next step here is to record atomic variable accesses.
Attachment #8666530 - Attachment is obsolete: true
We have a similar hooking system in xpcom/build/nsWindowsDllInterceptor.h; perhaps there may be an opportunity for code re-use.
(In reply to David Major [:dmajor] from comment #3) > We have a similar hooking system in xpcom/build/nsWindowsDllInterceptor.h; > perhaps there may be an opportunity for code re-use. Cool. Yes, CreateTrampoline() in that file is almost shockingly similar to Redirect() in this patch. To reuse this we'll need to expand things to work on other operating systems and architectures but that doesn't seem too hard.
Attached patch WIP (obsolete) — Splinter Review
WIP with handling for atomics. This adds a template argument to mozilla::Atomic to control how much of their behavior is preserved during replay (similar to how the existing MemoryOrdering argument controls the reordering optimizations the compiler is allowed to do wrt the atomic). For now a couple atomics in the JS engine used for interrupts are not recorded at all --- these are accessed all the time, including from jitcode, and recording and replaying executions which are interrupted by the slow script dialog isn't really something that can practically be done (there are a couple other events that will prevent recording I think, like throwing overrecursion errors, but nothing major). All other atomics record enough information to preserve the exact order of their accesses, but there is flexibility to relax that as needed.
Attachment #8668518 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Updated WIP that fixes a lot of bugs in the record/replay infrastructure. Replaying still balks early during startup; the main issue I'm working on now is how to deal with Grand Central Dispatch (GCD), an OS X API for task parallelism that is built using pthreads and some kernel syscalls that are undocumented and probably shift around between OS versions. Since GCD has its own thread pool, even if we can record its behavior it will be very hard to rewind process state like we will eventually want to do, so I think the best thing here is to intercept calls to the public interface of GCD (which is hopefully fairly stable) and emulate its behavior using our own thread pool and synchronization. I don't know if GCD is used heavily anywhere in browser code; everything I've seen has been deep inside calls to functions in other system libraries.
Attachment #8669236 - Attachment is obsolete: true
AFAIK we don't use GCD in Firefox.
Attached patch WIP (obsolete) — Splinter Review
This patch improves the hooking infrastructure used and is able to intercept all calls into GCD.
Attachment #8678876 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
This patch records a fair amount of GCD's functionality. The next step is to rejigger things so that all the GCD work items run serially on a single thread we create, which will make it straightforward to record execution of the work items. I'm hoping this won't introduce deadlocks due to artificial contention for the work item thread, but we'll see.
Attachment #8679613 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
This patch records enough of GCD's functionality that a simple "Hello world" HTML file can be loaded. Unfortunately, forcing all the GCD work items onto a single thread doesn't work out, due to new deadlocks being introduced, so recorded data for work items is still not suitable for use in replay. Plan B is now to mimic GCD more closely during replay. During recording we keep track of the maximum number of simultaneously executing GCD work items (as well as work items that are executed in place when GCD is called by our own threads), and during replay create our own pool of threads and assign work items to them to maintain the same degree of parallelism. This is more complicated but should be able to replay any GCD execution.
Attachment #8681621 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
This patch should have everything that is needed to record and replay GCD execution in the browser on a simple webpage. The next step is getting back to making replay actually work.
Attachment #8681714 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
This patch fixes a fair number of issues with GCD replay. Attaching this now because I'm going to take a step back and try a new approach. Up to now these patches have been trying to instrument things at as low a level as possible, first at the system call level and then at the GCD level when that library posed problems for replay. The main problem I'm running into now is the same as when everything was at the system level. It's very easy for the replay to go off the rails and start behaving differently, either in a way we can detect (different order of system calls) or just by crashing in the middle of some library. Most of these libraries are closed source and debugging these problems is extremely difficult. More problematic though is that some of these libraries shouldn't even be expected to behave the same during replay. If a library's behavior is affected by thread interleavings --- it passes information between threads, lazily initializes things, etc. --- then the calls it makes won't be consistent between executions. And without modifying the source for such libraries this problem isn't really solvable. Hooking functions like pthread_mutex_lock would help, but a lot of synchronization is done using inlined atomic operations which can't be hooked. So I think a better interface to target might be the one between code compiled into the browser and external libraries the browser is linked against. Libraries which are well behaved and operate in a thread local fashion can still be recorded/replayed at the system call level, but more complex multithreaded libraries can be isolated during recording and then have their observable effects replayed later.
Attachment #8681976 - Attachment is obsolete: true
Brian, can you describe the goal here a little better? If it's to make it easier to gecko debuggers to debug gecko, then I can understand the design. But I don't see how this could be used to create a JS record-and-reply debugger. It seems too low level. If the user wanted set a new breakpoint, it seems like it would perturb our state enough that the trace would no longer be valid. Comment 0 addresses that, but I don't quite understand what the conclusion was.
(In reply to Bill McCloskey (:billm) from comment #13) > Brian, can you describe the goal here a little better? If it's to make it > easier to gecko debuggers to debug gecko, then I can understand the design. > But I don't see how this could be used to create a JS record-and-reply > debugger. It seems too low level. If the user wanted set a new breakpoint, > it seems like it would perturb our state enough that the trace would no > longer be valid. Comment 0 addresses that, but I don't quite understand what > the conclusion was. This should be usable for people debugging Gecko, but it is primarily intended for the browser's debugger and web developers. Whether this will work with the browser's debugger as it currently exists is still speculation, so comment 0 raised the question but that's all it is for now. With a design like rr the state can't be perturbed at all during replay without diverging completely from the recording, so using the browser debugger wouldn't work without massive modifications. This design offers enough flexibility that certain state, like what code is JIT compiled, when GCs occur, how many allocations occur, and so forth should be able (again, speculative) to vary between the recording and replay, in a way that would give the debugger enough room to do its thing. For example, we can't expect to record and replay mmap and mprotect calls exactly, but during replay can keep track of which memory is mapped in which way at snapshots so that we can rewind process state in a sensible way. If the presence of the debugger can affect the existence or timing of operations that are in the recording (for example, finalizers called from the GC which close file descriptors) then we would need to modify either the information we record or Gecko itself to be able to move forward.
Another question: how would we get the browser to the same initial state as the recording started in? Would we save the raw bits in the memory mapping or would we have some kind of serialization/deserialization for the entire heap?
(In reply to Bill McCloskey (:billm) from comment #15) > Another question: how would we get the browser to the same initial state as > the recording started in? Would we save the raw bits in the memory mapping > or would we have some kind of serialization/deserialization for the entire > heap? Right now the focus is on recording as soon as the process starts. We would only record e10s content processes, so from the user's perspective the "start recording" button or whatever would open a new tab in a new process whose behavior could be replayed later (also in a new process). Starting to record in the middle of executing an existing content process would be very cool but I don't know how feasible it is. I'm interested in using a memory snapshot based system for rewinding (this isn't required though, and the record/replay mechanism is largely separate from the eventual rewind and/or omniscient debugging mechanism), so we could potentially (again, all this is speculative) start recording at any point by taking a memory snapshot (would need to know all mapped memory in the process, which might require hooking mmap/mprotect whether we are recording or not), recording until the user desires to stop, then reverting the process to the initial snapshot for replay. Replaying such a partial recording in a new process would be even harder, and may be intractable if ASLR is in use.
Attached patch WIP (obsolete) — Splinter Review
This patch intercepts enough system library calls so that on a simple webpage we never enter GCD without passing through a function or logic whose behavior is recorded (and will not execute at all during replay). Generally this is done by hooking the library functions, though objective C messages are handled via manual instrumentation (I don't know how or whether these message invocations can be hooked). I really like how this approach is going; everything that is hooked is called directly from browser code, these functions are all documented public APIs (GCD has a tricky issue where other system libraries enter it using undocumented private APIs), and this strategy should carry over well to Windows and other OSes.
Attachment #8682744 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
This patch fixes a lot of replay issues. Replay gets much further than in the past, though still not far enough to completely replay a simple webpage load. The new approach is still going very well, with replay issues much easier to track down and fix (since all the involved code is compiled with the browser rather than linked in).
Attachment #8683933 - Attachment is obsolete: true
Thanks Brian, this is really interesting. > Hooking functions like pthread_mutex_lock would help, but a lot of synchronization is done using inlined atomic operations > which can't be hooked. OK, that's good to know. Why do we need to wrap void library functions like, e.g., CTFontManagerSetAutoActivationSetting? Why do we need CopyInstruction? We might be able to get that code from an existing library. I'm pretty sure we could extract the relevant parts of DynamioRio for example. It's not clear to me what state is now being preserved between recording and replay. We'll need a precise definition of that.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19) > Why do we need to wrap void library functions like, e.g., > CTFontManagerSetAutoActivationSetting? During recording, many of these void functions invoke system calls or other library calls in a way we can't precisely replay. This is mainly due to the libraries being multithreaded and performing synchronization in a way we don't record, but there's also drag due to not having source to these libraries and the ability to understand why replay broke down. During replay, we don't execute much of the code in these libraries, due to their unpredictable behavior. Even if a library API behaves predictably, many of the opaque heap structures (like CFStringRef) are replayed as 0x1 constants so the API will crash if we call it with that fake value. Right now we include information about all these void library functions in the recording, but that isn't necessary --- we could just hook them and nop them during replay but not include any information in the recording. > Why do we need CopyInstruction? We might be able to get that code from an > existing library. I'm pretty sure we could extract the relevant parts of > DynamioRio for example. CopyInstruction started as an expedient hack (it's similar to what nsWindowsDllInterceptor.h does) and then metastasized. At this point it's not much of a drag on development but in the long run it would be much better to get this functionality from a library. > It's not clear to me what state is now being preserved between recording and > replay. We'll need a precise definition of that. Yes, for sure. Right now the ideal is that the replay should preserve all logic (control flow and scalar values, but not pointer values) from the recording except for the logic that goes on inside system calls and Cocoa library calls. There are some exceptions: the replay can GC and Ion compile scripts differently (which has effects on other parts of the system, like nsISupports refcounts differing during replay), and some other library calls are emulated (e.g. localtime_r sometimes uses GCD). There's no effort to make sure we capture all Cocoa entry points, though, and this model is in flux and subject to further evolution. There are a couple main issues: - Not interacting with the system means that nothing will get painted, which we'll of course need to do for the replay to be sensible to users. A first cut might work to just record the bitmap data that is painted (maybe optimized to diff the data on subsequent paints) but I can't imagine that being a viable approach for graphics intensive pages like games. - The issue of attaching the JS debugger still looms. The GC/Ion compilation stuff above should help with this quite a bit, but there are other things like how to interact with the parent process (a separate IPC message loop, presumably) that will cause the replayed process to diverge further from the recording. Brian
Attached patch WIP (obsolete) — Splinter Review
This patch is able to completely replay the load of a simple website. Yay! Of course once the child process exits on its own the parent process thinks it crashed, but whatever. The next step is to get the simple graphics replaying mentioned in comment 20 to work (we already need to precisely record and replay graphics data, since some things like page thumbnails rely on it), which I think will require the parent and child processes to communicate with each other.
Attachment #8685487 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
A fair amount of the previous patch is asserts added to debug replaying problems. This patch removes these.
Attachment #8688706 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Unless I'm totally misunderstanding things, the content process paints data by rendering layers and then sending updates to the compositor thread in the parent process via IPDL. There are at least a couple different ways to relay updates which the replayed process tries to perform to the actual parent process, but I think the most robust is for the replayed process to be able to communicate with the parent process via IPDL in a manner independent from the original recording. This should work well with future IPC we need to do, like having the parent process control the replay via rewinding, etc., and having the parent process communicate with debugger actors. The attached patch updates things so that during replay we create two new threads: one thread has a chromium I/O message loop which communicates with the actual parent process, and the second thread has a second message loop to process messages coming in from the parent process. These threads operate independently from the threads serving this role in the original recording (IOThreadChild and the main thread), which during replay act in lockstep with how they behaved in the recording and don't do any actual IPC. This patch is far enough along that we create an alternative to ContentChild, ReplayContentChild, which looks like it's communicating with the parent process, though as soon as the parent sends ReplayContentChild a message it forces a crash.
Attachment #8688745 - Attachment is obsolete: true
Separate IPDL threads sound good. However, it might be better to use completely different protocols instead of PContent etc since we have quite different needs.
Attached patch WIP (obsolete) — Splinter Review
This patch models enough of the layout IPDL in the replayed process that during replay we are able to render the graphics for a simple "Hello World!" webpage. I tried using a separate protocol from PContent, but ran into difficulties because it seemed like I would need to modify quite a lot of parent side code to tolerate interactions via multiple protocols (maybe there's a good place to refactor things, I don't know the parent side code at all really). This patch has skeletons for protocols like PContent and PBrowser that ignore most incoming messages, and allocates its own actors for layout protocols like PLayer and PTexture that are 1:1 with actors in the replayed content but interact with the actual parent process. During replay we watch for update messages from the replayed content and reflect those updates into ones for these alternate actors and send them to the parent.
Attachment #8691590 - Attachment is obsolete: true
Attached file test.html (obsolete) —
This is a simple HTML file that uses setTimeout() to count from 1 to 10. Execution of this webpage can be completely replayed, including graphics. From here I want to use this as a testcase to get process rewinding and (afterwards) debugger attaching to work during replay. Combining the two should allow implementation of reverse breakpoints (find the last time a breakpoint executed).
Attached patch WIP (obsolete) — Splinter Review
Updated WIP that can record and replay the test.html above. This just fixes some bugs with the previous patch where shared memory buffers were not locked before being sent to the parent process (which will unlock them, and will get angry if the read count becomes negative) and where mach_msg IPC calls were not being recorded/replayed properly.
Attachment #8694735 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
This patch is functionally the same as the last one but has been rebased to tip and cleaned up somewhat. At roc's suggestion I will break this apart into smaller patches to make it easier to follow and see how other parts of the browser are affected.
Attachment #8695315 - Attachment is obsolete: true
Attached patch Part 1: Link NSPR in more places (obsolete) — Splinter Review
The record/replay infrastructure is in NSPR, so libnspr needs to be linked into more places in the build.
This has the recording/replaying infrastructure for thread/lock creation, lock ordering, atomic access ordering, thread events, function hooking, and alternate implementations of all the hooked functions. Sometime later this will be split up more.
This modifies mfbt/Atomics.h the NSPR atomic interfaces to automatically record and replay atomic accesses. In the former case this is done at a user-specifiable precision, via template parameters.
This patch has the remainder of the changes to NSPR, to keep track of ids on threads and locks and ensure that creation and uses of these are recorded and correctly replayed.
The chromium IPC code currently uses pthreads locking and threading primitives, which are not included in the recording. This patch changes things so that NSPR primitives are used instead.
Attachment #8696738 - Attachment is patch: true
Attached patch Part 6: Parent side changes (obsolete) — Splinter Review
This patch has changes on the parent side for interfacing with recording/replaying child processes.
Attached patch Part 7: Unrecorded atomics/locks (obsolete) — Splinter Review
This patch relaxes certain atomic and lock accesses so that they are not included in the recording. These are mostly related to GCs and JS helper thread behaviors (like Ion compilations), which are permitted to occur at different times in the recording and replay.
This patch changes the child process's behavior in a few places when recording or replaying is active, either to simplify recording/replaying or to avoid messy interactions with replay specific threads.
Most child side instrumentation of library calls is done via the hooking infrastructure in part 2, which doesn't require changing the code which calls those libraries at all. Code that interacts with libraries by sending mach messages needs to be manually instrumented, however, which this patch does. This is the bulk of the child side changes needed to get replay to work. I'm hoping that there is a way to do this so that the code sending the mach messages doesn't need to be modified, but I haven't yet looked into how mach messages work and whether this is possible.
Attachment #8696745 - Attachment is patch: true
This patch has the remaining manual child side instrumentation needed to get replay to work.
Attached patch Part 11: Replayed process IPC (obsolete) — Splinter Review
This patch has the changes needed to allow the replayed process to communicate with the actual parent process via IPDL messages and shared memory. It also has the implementation for reflecting layers/compositing messages from the child into messages which can be sent to the parent process; this stuff will be split off into another patch some time later.
This patch includes the graphics code side of the layers/compositing message reflection. When IPDL actors and shared memory regions are created, or update messages are sent, the replay IPC code is notified so that equivalent structures or messages can be created/sent to the actual parent process.
Thanks Brian! We discovered people already working on a similar project. See bug 1231802.
Attached patch rewind WIP (obsolete) — Splinter Review
WIP for rewinding process state. This goes on top of the previous patches. Right now this records a series of snapshots as the program executes, one early on and (for now) later ones each time a compositor update is sent to the parent. Each snapshot after the first is a diff of heap and static memory vs. the previous snapshot --- the mach_vm_region syscall is used to find all mapped memory in the process at the first snapshot, and then mprotect and an asm.js-style signal handler are used to find regions modified before the next snapshot. The snapshots also include stacks and other state for each thread, recorded and restored with setjmp/longjmp and some inline assembly. Currently we can restore an earlier snapshot, though the main thread crashes shortly after doing so, so there's still work to do.
Attached patch rewind WIP (obsolete) — Splinter Review
Rewinding in this patch works well enough that we can run to the point where a compositor update is first sent to the parent, rewind to when the initial snapshot was made, and continue running forward to the same point and rewinding ad nauseam. The main tricky bit to deal with has been that after rewinding all heap memory naively, pthread condition variables do not work right. I think that state in the process and state in the kernel are getting out of sync, but need a better understanding. For now the condition variables are allocated in a memory region whose contents are preserved when rewinding. The two main remaining steps for rewinding are (a) that we need to be able to rewind compositor state as well (right now the special replay threads for doing IPC and compositing are disabled if we want to rewind) and (b) that the parent process should be able to control when and how the child process is rewound, via a parent side developer toolbar I guess.
Attachment #8698587 - Attachment is obsolete: true
Attached patch rewind WIP (obsolete) — Splinter Review
This patch enables the replay specific IPC threads to run while snapshots are taken and restored. The basic strategy is the same as with the other threads which are playing back from the recording --- save and restore the contents of their stacks and all memory they use. Since these threads are working with live file descriptors, however, there is at least the potential for things to get all messed up, but by waiting for all needed descriptors/channels to be fully initialized before the first snapshot is taken this strategy seems to work alright. (If this isn't viable in the long run there are more drastic options like tearing down the replay specific threads before taking a snapshot and rebuilding them afterwards, but I'd rather avoid this if possible.) This patch gets far enough along that we can run to the end of the page's execution, rewind to the beginning, and then cause the parent to crash when we resend it messages for creating IPDL actors it already knows about. The next step will be modifying the messages sent to the parent so that the compositor is able to maintain a coherent state even after the child process rewinds itself.
Attachment #8698921 - Attachment is obsolete: true
Attached patch rewind WIP (obsolete) — Splinter Review
This patch is able to show graphics after rewinding; on the test page it runs to the end, rewinds to the beginning, then shows updates as it plays forward to the end again. This is managed by having the replay create all the graphics IPDL actors it will need up front (before the first snapshot is taken) using information gathered during the recording. When rewinding or fast forwarding, then, all the actors needed to construct a coherent state in the parent process are already in place. There still isn't anything complicated going on with reconstructing the compositor state which existed at some time when rewinding, as this page is so simple that all the graphics are contained in one or two tiles. Will wait on improving this until we start looking at more complicated pages. This patch also adds some more ignored locks and atomics so that GCs and CCs can occur at different times in the recording and replay (something I thought I did earlier but didn't actually do), so it's a good deal larger than the last patch.
Attachment #8700063 - Attachment is obsolete: true
I've started looking at the devtools JS debugger stuff and how to attach breakpoints in the replayed process. I think the design here will need a little reworking. I want to be able to use the C++ JS debugger (js/src/vm/Debugger.* and some associated bits) in the replayed process, but devtools runs quite a bit more javascript code in the client process, whose results are collated and sent to the parent process in the form of (I think) JSON objects. I'd like to leave this JS code alone as much as possible, but (a) allowing the replayed process to execute JS code that didn't run in the recording could really mess up the replay, and (b) rewinding the process could really confuse the server without some substantial modifications. It would be better I think if the devtools JS ran in a separate middleman process from the replayed process. This would allow the devtools JS to do whatever it wants without fear of damaging the replayed process state, and all its structures and state would be preserved when the replayed process rewinds or fast forwards itself. The design would be: - The parent process communicates with the middleman process via the normal PContent/PBrowser/etc. IPDL protocols. - The middleman process communicates with the replayed process via special replay protocol(s) that are tailored to the operations the replayed process can perform. The parent is largely unchanged from its current behavior. In addition to running the devtools JS, the middleman process manages changes in state when the replayed process rewinds/fastforwards so that the parent process and devtools JS have a coherent representation of that state. For example, when playing forward the middleman will keep track of compositor updates so that if the process rewinds later it can reconstruct that earlier state by sending messages to the parent. WRT the devtools JS, whenever the replayed process takes a snapshot the C++ JS debugger should not be active, but the destruction and reconstruction of these C++ constructs should be transparent to the devtools JS when playing forward through a snapshot or rewinding/fastforwarding. The middleman process should be able to manage this transparency. Both this and the compositor stuff could in principle be done in the replayed process, but it's much harder to do there due to the nature of the process rewinding --- all state that should be preserved across rewinding needs to be stored either on disk or in specially allocated memory regions. The middleman process doesn't really need to be a separate process from the parent; the parent could instead start a message loop thread and I/O thread which do all the work which the middleman does. I think it will be easier to implement as a separate process, though, and because IPDL can be used for both inter-process and inter-thread communication, changing the middleman to run in the parent down the road (if desired) should be pretty painless.
Attached patch rewind WIP (obsolete) — Splinter Review
This patch does the same stuff as the last one, but does so using a middleman process as described in comment 46. Overall I think this is a definite improvement, and will provide a lot more flexibility going forward. There is more boilerplate and overhead to marshal compositor updates from the replayed process into the middleman and then into the parent process, but the middleman is just a lightly modified Gecko content process --- it defers to the replayed process for compositor updates, but can do anything else a normal content process can do without worrying about how the process replay will be affected. I forgot to say earlier, but using the middleman also gets us to the separate protocols idea from comment 24. Instead of the replayed process having a skeleton PContentChild/PBrowserChild/etc. implementation, the middleman process has standard versions of these, and the middleman and replayed processes communicate with a specialized PReplay protocol.
Attachment #8701073 - Attachment is obsolete: true
(In reply to Brian Hackett (:bhackett) from comment #46) > I've started looking at the devtools JS debugger stuff and how to attach > breakpoints in the replayed process. I think the design here will need a > little reworking. I want to be able to use the C++ JS debugger > (js/src/vm/Debugger.* and some associated bits) in the replayed process, Is the goal to use all (most? the ones that make sense? read only?) of the devtools during replay? Because while a lot of functionality is built on the Debugger API, just as much is built on a random assortment of other XPCOM APIs and web APIs themselves (such as the "inspector" that lets one view and edit the dom tree and the styles applied to each node directly). Point being that just re-implementing the Debugger API is not enough to get most of the devtools. Even just the JS debugger itself may need more. > but devtools runs quite a bit more javascript code in the client process, > whose results are collated and sent to the parent process in the form of (I > think) JSON objects. Yes, the devtools run a _ton_ of JS. Almost all of the devtools remote debugger protocol server (which lives in the debuggee content process) is JS. Yes, the remote debugger protocol uses JSON for the most part, but there is also the "bulk data" packets used for arbitrary binary data which are implemented on top of streams directly. For more info on the protocol itself: https://wiki.mozilla.org/Remote_Debugging_Protocol Request and response messages are sent over a "transport". We have a `ChildDebuggerTransport` for debugging the child process from the parent process, which seems most relevant here. See https://dxr.mozilla.org/mozilla-central/source/devtools/shared/transport/transport.js#699-712 If there are any devtools or Debugger API related questions I can answer, please ask away :)
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #48) > (In reply to Brian Hackett (:bhackett) from comment #46) > > I've started looking at the devtools JS debugger stuff and how to attach > > breakpoints in the replayed process. I think the design here will need a > > little reworking. I want to be able to use the C++ JS debugger > > (js/src/vm/Debugger.* and some associated bits) in the replayed process, > > Is the goal to use all (most? the ones that make sense? read only?) of the > devtools during replay? Because while a lot of functionality is built on the > Debugger API, just as much is built on a random assortment of other XPCOM > APIs > and web APIs themselves (such as the "inspector" that lets one view and edit > the > dom tree and the styles applied to each node directly). Point being that just > re-implementing the Debugger API is not enough to get most of the devtools. > Even > just the JS debugger itself may need more. Right now the goal is to just use the JS debugger during replay. Eventually, it would be nice to use all the devtools that make sense for a replay debugger (Everything except performance/network? In principle even these two could make sense for the replay debugger but the overhead of recording could perturb them a lot). Everything would be read only to the devtools, since changing state in an observable way can cause the replay to diverge from the recording. Accesses to XPCOM objects in the replayed process can still be supported; instead of having XPConnect objects which call into XPCOM directly, the devtools code will have proxies whose accesses are resolved by sending synchronous IPDL messages to the replayed process. The patch I'm working on now does this for things like the doc shell and DOM window, and the result is fairly transparent to the devtools code (I have been modifying the devtools code to handle foobar.QueryInterface(), which could maybe be done transparently but would be pretty gnarly under the hood). I don't know yet whether this will be enough to support the DOM inspector. The main question is whether the information the inspector is getting --- the boundary/padding for an element, the element which is found by a right click -> Inspect Element, etc. --- can be determined solely from the replayed process' document tree and layout state, without making calls into the system graphics libraries. > If there are any devtools or Debugger API related questions I can answer, > please > ask away :) Sure, thanks!
Attached patch debugger WIP (obsolete) — Splinter Review
This patch adds some integration with the JS debugger. It does enough that one can run the replayed process, and when it finishes open the debugger UI, see the sources in the replayed process, set a breakpoint on a line, click on a new 'rewind' UI button, have the replayed process rewind to the last point where the breakpoint was hit, and finally have the middleman process notify the chrome process about the pause so it can update the UI appropriately. Not much else works --- when paused we don't yet provide properties on the environment or visible objects to inspect --- but I feel this suffices as a proof of concept, that we can attach the debugger and have it inspect state in the replayed process without perturbing the replay. The prototype is, for lack of a better word, done. Things I'd like to do next: - Post a design document about the project which describes its organization, technical features, and the invariants it is maintaining. Is MDN the best place for this? - Post a cleaned up, rolled up patch and a new patch series breaking it down. - Start recording/replaying more complicated pages. The robustness of the various oarts of this project is I feel the biggest unknown at this point.
The best place for a design doc is probably wiki.mozilla.org.
(In reply to Boris Zbarsky [:bz] (Vacation until Jan 4) from comment #51) > The best place for a design doc is probably wiki.mozilla.org. My understanding is that MDN is best for code-related stuff, and wiki.m.o is for stuff relating to people, teams and planning. E.g. from https://wiki.mozilla.org/MozillaWiki:About: > To summarize: MDN documents how to interact with code. The Mozilla wiki documents how to interact > with teams. SUMO documents how to interact with Mozilla products (as an end-user).
Huh. It used to be that MDN was for documentation for how people outside the Mozilla project should interact with our code (e.g. the web developer documentation) and things like https://wiki.mozilla.org/Gecko:Overview lived... well, where it's living. I wonder when that was supposed to have changed.
Attached patch WIP (obsolete) — Splinter Review
Rolled up patch, rebased to tip and cleaned up in some places.
Attachment #8695312 - Attachment is obsolete: true
Attachment #8696702 - Attachment is obsolete: true
Attachment #8696730 - Attachment is obsolete: true
Attachment #8696731 - Attachment is obsolete: true
Attachment #8696733 - Attachment is obsolete: true
Attachment #8696735 - Attachment is obsolete: true
Attachment #8696738 - Attachment is obsolete: true
Attachment #8696740 - Attachment is obsolete: true
Attachment #8696742 - Attachment is obsolete: true
Attachment #8696743 - Attachment is obsolete: true
Attachment #8696745 - Attachment is obsolete: true
Attachment #8696746 - Attachment is obsolete: true
Attachment #8696750 - Attachment is obsolete: true
Attachment #8696752 - Attachment is obsolete: true
Attachment #8701885 - Attachment is obsolete: true
Attachment #8704372 - Attachment is obsolete: true
Attached patch Part 1: Build system changes (obsolete) — Splinter Review
Link NSPR in more places, and perform all other build system changes (some of which don't fit cleanly into any of the remaining patches).
This has the record/replay infrastructure from the earlier patch, plus the rewind infrastructure, which is pretty closely intertwined. The hooking parts of the record/replay infrastructure have been separated into a new file.
The atomics interface changes are generally unchanged from the earlier patch.
The NSPR locking/threading changes are also very similar to those in the previous patch.
Changing over the chromium IPC code to use NSPR primitives is pretty much the same as in the earlier patch.
Attached patch Part 6: Parent side changes (obsolete) — Splinter Review
Chrome process changes are even more minimal now that during replay it is interacting with an almost normal content process.
Attached patch Part 7: Unrecorded atomics/locks (obsolete) — Splinter Review
The amount of unrecorded atomics/locking behavior is more expansive now that we're making sure no activity is recorded during GC/CC.
Behavior changes on the side of the recording/replaying content process vs. a normal content process are also more extensive now. A lot of this is disabling logging/etc. code during record/replay, which is easier to do than to make sure the logging does not affect the recording. It also includes some things like avoiding posting runnables during a GC, which is preventing incremental GC/CC from being used during recording/replaying and is a restriction I would like to fix.
This is pretty much unchanged from the earlier patch.
The additional instrumentation required to get replaying to work is also pretty much unchanged from the earlier patch.
This patch has the PReplay IPDL protocol and the middleman process side of its implementation, as well as other middleman specific code for avoiding loading documents / rendering graphics of its own, and for conveying graphics updates from the replaying process to the chrome process and debugger messages from the middleman's JS debugger to the replaying process.
Attached patch Part 12: Replaying process IPC (obsolete) — Splinter Review
The other side of the PReplay protocol, which lives in the replaying process. This patch conveys graphics updates to the middleman process, and debugger messages from the middleman process to the replaying process side of the JS debugger. It also has some modifications to the IPC code so that the replaying process can pause replay specific threads before taking or restoring a snapshot.
Attached patch Part 13: C++ JS debugger changes (obsolete) — Splinter Review
Changes to the C++ JS debugger to accommodate the replay JS debugger. Currently, all debug objects/scripts/etc. in a Debugger instance are wrappers around a JS object/script/etc. in another compartment. This patch allows the Debugger to have an associated ReplayDebugger, which stores different data in the debug objects/scripts/etc. and has alternate implementations for the various methods on them. This patch has the Debugger code to manage this dichotomy; the actual implementation of ReplayDebugger is in the next patch.
Attached patch Part 14: C++ JS replay debugger (obsolete) — Splinter Review
This patch has the implementation of ReplayDebugger on the middleman and replaying process sides; whenever the Debugger needs information about an object/script/etc. it calls a method on its ReplayDebugger which either gets the information from local state or calls a hook to send an IPDL message to the replaying process, which when received calls yet a separate hook to go back into ReplayDebugger code and compute the answer to the query.
As in the earlier patch, this includes spots where the replayed portion of the replaying process calls into IPC code so that a message can be sent to the parent process. The format of graphics messages in the PReplay protocol are different, however, from those in PLayerTransaction (layers, textures, etc. are referred to using indexes, rather than actors) which required some refactoring of the layer messages to avoid duplicate code; this patch also includes changes to the graphics code to handle this refactoring.
This patch adds a rewind button to the debugger client's UI, which when pressed sends a message to the debugger server living in the middleman process.
On the server side of the devtools code, this patch has changes to receive the rewind button event and pass it onto the C++ JS debugger. It also has some other changes needed for replaying, mainly for interacting with the proxies created for the docshell/window/etc. in the replaying process, which don't (yet?) support generic QI calls. This is the last patch in this series.
The middleman process design sounds like a really good idea. Thanks!
Thanks for the MDN document. It's really good. We might be able to simplify the graphics situation by switching rendering to use Skia or the cairo image backend instead of CoreGraphics on Mac. I assume the issue of asm.js's mprotect handlers interfering with WebReplay's can be fixed. A few things we need to figure out soon-ish: 1) Can this work on Windows? It might be a good idea to have someone try porting it to Windows who's not you. 2) How much work is it to extend to more browser features? 3) What's the long-term maintenance burden going to be? 4) Whether we really want direct debugging of a replayed process or we should target an omniscient approach. I haven't done anything here yet...
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #74) > Thanks for the MDN document. It's really good. > > We might be able to simplify the graphics situation by switching rendering > to use Skia or the cairo image backend instead of CoreGraphics on Mac. This would still require using CoreGraphics for things like rendering fonts, right? I don't think that intercepting the CoreGraphics routines is the main problem (CG is a well documented public API), but my biggest concern right now is the overhead of recording graphics data produced by CG. I'll attach a patch and some more info on this in a bit. > I assume the issue of asm.js's mprotect handlers interfering with > WebReplay's can be fixed. Yes. > A few things we need to figure out soon-ish: > 1) Can this work on Windows? It might be a good idea to have someone try > porting it to Windows who's not you. I don't know of anything that would prevent this from working on Windows. Recording/replaying is all based on hooks, which we already use in other places on Windows, but the rewinding infrastructure does more systemy things like crawling the virtual address space and using a signal handler. These things look like they have analogues on Windows, e.g. VirtualQuery looks functionally similar to mach_vm_region. If anyone wants to work on this that would be great, otherwise I can, though probably not until March. I'll be on PTO most of February. > 2) How much work is it to extend to more browser features? I'll post another patch in a bit, but it took a few days to get from recording/replaying the text only test page to recording/replaying the wikipedia home page, which does new things like loading images, using CSS, having a complex page layout, etc. The main feature I'm concerned about is WebGL, and I hope we can handle it by just hooking a bunch of new API functions and doing something similar to what we're doing with CoreGraphics. > 3) What's the long-term maintenance burden going to be? I think there are three categories to consider: - Within the record/replay system, the main burdens should be for handling breakages with new OS versions and for hooking new APIs being called by the browser. - The glue code for replay IPC, graphics messages, and debugger integration would need to be updated to continue working after changes to those browser components. - In other parts of the browser that don't directly interact with the record/replay mechanism, I would like to minimize changes as much as possible, which should also minimize the maintenance burden. I think things are pretty good right now, though the mach message intercepting is still pretty ugly and there are some other improvements that would be nice to make, like allowing the GC to work the same way whether recording/replaying or not.
Attached patch robustness WIP (obsolete) — Splinter Review
With this patch, the main page for Wikipedia can be recorded and replayed. As noted above, while this is a pretty static page it uses a lot more browser features than the basic test page did. The changes in this patch are mostly internal reorganization/improvements in the record/replay code. Other changes include some new APIs being hooked, messages being intercepted, and several hash tables that need to behave deterministically (the handling for these hash tables is pretty low level currently, I want to fold these into the hashtable classes themselves at some point).
Attached patch graphics WIP (obsolete) — Splinter Review
This patch is still pretty preliminary but illustrates what I'm hoping to with graphics data. As I understand it the tile based renderer creates a bunch of CGBitmapContexts which render graphics updates to shared memory buffers managed by IPDL textures. Right now we handle these during record/replay by keeping track of API calls which will change the rendered data, and then at points where the data is needed (e.g. before sending a graphics update to the parent) recording the contents of the graphics data buffers so they can be restored later. This is simple and it works but it produces an unacceptable amount of data. When loading Wikipedia the size of the recording is 176MB, 88% of which is graphics data. If I load Wikipedia, scroll to the bottom and then back to the top, the size of the recording is 477MB, 95% of it graphics data. This patch leverages the hooking of the CoreGraphics and other Core APIs to avoid including any of this graphics data in the recording. When the process is replaying, when these calls are performed they are recorded in a separate binary stream, and when a graphics update is sent to the middleman that binary stream is also sent to the middleman, so that it can perform the graphics calls itself and render graphics to the memory buffers it shares with the replaying process. It's possible that the middleman could render the graphics data directly to the shared memory buffers it shares with the chrome process, eliminating the need for sharing graphics buffers with the replaying process, but this would require that the replay work without ever accessing the graphics data and that is not currently the case. Currently this works with a simple 'Hello World!' page. Handling more complicated pages should only require improving this binary stream to work with other APIs that are already being hooked.
> > 2) How much work is it to extend to more browser features? > > I'll post another patch in a bit, but it took a few days to get from > recording/replaying the text only test page to recording/replaying the > wikipedia home page, which does new things like loading images, using CSS, > having a complex page layout, etc. The main feature I'm concerned about is > WebGL, and I hope we can handle it by just hooking a bunch of new API > functions and doing something similar to what we're doing with CoreGraphics. I'm also worried about the media stack, and third-party code like video codecs and the WebRTC stack that use system APIs and custom threading. (In reply to Brian Hackett (:bhackett) from comment #76) > With this patch, the main page for Wikipedia can be recorded and replayed. > As noted above, while this is a pretty static page it uses a lot more > browser features than the basic test page did. The changes in this patch > are mostly internal reorganization/improvements in the record/replay code. > Other changes include some new APIs being hooked, messages being > intercepted, and several hash tables that need to behave deterministically > (the handling for these hash tables is pretty low level currently, I want to > fold these into the hashtable classes themselves at some point). Yeah, iterating nsPtrHashKey tables was obviously going to be an issue. It might be a good idea to just eliminate that, independently of record/replay. Your graphics recording approach might be able to use Moz2D's CreateRecordingDrawTarget, which in principle does what you need.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #78) > Your graphics recording approach might be able to use Moz2D's > CreateRecordingDrawTarget, which in principle does what you need. Great, I will try doing this. Ideally all the CoreGraphics interaction would be behind the DrawTarget and other 2D abstractions, so that we don't need to hook the graphics functions at all. Instead different draw targets would be used, so that while recording the CG target is active but while replaying a recording target is active whose events are forwarded to a CG target in the middleman process. This design should work on other platforms as well.
Attached patch graphics WIP (obsolete) — Splinter Review
This patch expands on recording/execution of graphics commands so that during replay of the home page for Wikipedia the entire page is rendered correctly. Next I'll try interposing DrawTargetRecording as described above to do the same thing.
Attachment #8707023 - Attachment is obsolete: true
Attached patch graphics WIP (obsolete) — Splinter Review
This is a replacement for the previous graphics patch which uses DrawTargetRecording instead of hooking CoreGraphics API calls to perform drawing in the middleman instead of the replaying process. During both recording and replay a DrawTargetRecording is active; when recording the DrawTarget calls are forwarded to the normal native DrawTarget as is normally done by DrawTargetRecording, and when replaying the DrawTarget calls are simply recorded, with no underlying native DrawTarget. This allows many of the CoreGraphics hooks to be removed. With this patch, the replaying process is able to render the Wikipedia homepage. There are still a couple issues: - Font data consumes over 80% of the recording data on Wikipedia, and it would be better to have an efficient way of encoding them between recording/replay. The earlier patch just recorded the full names of any system fonts being created, and doing something similar here when possible would be good. - nsNativeThemeCocoa::DrawWidgetBackground gets the CGContext from a DrawTarget and draws things directly to it, for what looks like native widgets like checkboxes and such. These uses won't be captured by the DrawTargetRecording, though this is a more general issue with DrawTargetRecording and not specific to record/replay (AFAICT other than here DrawTargetRecording is not used on Macs, e.g. before this patch there was no way for the decoder to construct fonts for drawing on platforms other than Windows).
Attachment #8707484 - Attachment is obsolete: true
Attached patch record font names (obsolete) — Splinter Review
This patch records full font names when available from the CGFontRef instead of the font's tables, bringing the total size of the recording on Wikipedia down to around 22 MB (which is where it was with the earlier CoreGraphics based patch).
Attached patch robustness WIP #2 (obsolete) — Splinter Review
This patch goes on top of the previous ones and includes various fixes to improve robustness. With this patch I can record and replay a simple GMail session --- from a fresh profile, the login page is loaded, I log in, and the inbox loads and displays. The total recording size is about 40 MB, which seems reasonable and should improve if we start doing in memory recording compression. Everything about the page looks right, except for animations that stay on their initial frame (e.g. the avatar that shows up on the login screen). I don't know what to do about these yet, especially when they are based on absolute time stamps and when the debugger is active on these pages. Anyways, the main new issue this patch exposed is what to do about places where non-deterministic parts of the browser interact with deterministic parts. In particular, there are several nsISupports classes --- nsGlobalWindow, HTMLImageElement, and so forth --- that can be destroyed during GC and trigger recorded events in their destructor, and cross compartment wrappers that are destroyed during GC sweeping can trigger recorded events when they are regenerated later. There are a few ways these could be handled: A) Move the recording boundary so that recorded events are never triggered by these operations. This is conceptually the cleanest, and is just an extension of what we already do with e.g. atomic refcounts and other quantities that change during GC. It will probably require the most work, though, as some of these destructors do things like posting events to the main thread's event loop and sending IPDL messages, which will need new infrastructure to separate out the deterministic and non-deterministic portions of these other systems. B) Ensure that destruction of the data in these classes are performed consistently, using an event on the main thread that fires every second or so. During recording, objects which can trigger events have their destruction delayed until this event next fires. During replay, keep the corresponding objects alive until the same event firing, and then either destroy them or clean up their internal data, depending on whether they still have other references. I tried to do this approach, but don't like it that much. C) Force these objects to behave deterministically, by simply never destroying them. This is of course the simplest approach and is the one this patch takes. It might be viable in the long run, since recordings are expected to be relatively short lived, but it still doesn't seem ideal. Probably the best strategy is a mix of A) and C). Start by just not destroying such objects in a non-deterministic fashion, and then find some minimally invasive ways to improve Gecko so that the destruction of these objects does not interfere with the recording and can happen as it normally would.
Attached patch Windows WIP (obsolete) — Splinter Review
This patch makes enough changes so that the build works on Windows. It doesn't do anything else, but since this patch is so large (mostly from code reorganization and renaming variable types to appease cl) it seems better to split off from changes with actual substance.
Attached patch Windows WIP #2 (obsolete) — Splinter Review
This patch goes on top of the previous one and improves the Windows build so that all thread/locking activity is recorded and so that the hooking infrastructure works. Right now this just hooks a few functions, the next step is to expand this to a more complete set.
FWIW, I just noticed that there's a TimeTravelDebugging branch of https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I haven't looked at their implementation and how it compares to this one.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #86) > FWIW, I just noticed that there's a TimeTravelDebugging branch of > https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I > haven't looked at their implementation and how it compares to this one. I looked a bit at this. It seems like they can take a snapshot of the heap and then they also maintain an event log that they can replay. An event can be something like "Date.now called, returned value X". Here's a video where they explain how it works (I didn't watch it completely yet): https://channel9.msdn.com/blogs/Marron/Time-Travel-Debugging-for-JavaScriptHTML
Attached patch WIP (obsolete) — Splinter Review
Updated rebased and rolled up patch. Back in early February I did a fair amount more on windows, adding hooks for many of the kernel32 interfaces, but things don't work yet. From here I'm going to change tack a bit, mostly working on Mac and improving the quality of the patch and robustness of the technology, aiming towards getting a solid, interesting demo ready. I'll continue working on improving the windows side of things, though with lower priority.
Attachment #8705209 - Attachment is obsolete: true
Attachment #8707012 - Attachment is obsolete: true
Attachment #8708541 - Attachment is obsolete: true
Attachment #8708687 - Attachment is obsolete: true
Attachment #8711036 - Attachment is obsolete: true
Attachment #8711510 - Attachment is obsolete: true
Attachment #8713640 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Updated rolled up patch. This makes several improvements to the basic record/replay infrastructure --- records are compressed in memory while being written to disk, limits on the number of threads and locks and so forth that can be tracked are removed, hash tables are used instead of vectors for associating information with objects. It also fixes some things so that executions can be recorded/replayed with a --disable-debug --enable-optimize build.
Attachment #8727462 - Attachment is obsolete: true
(In reply to Jan de Mooij [:jandem] from comment #87) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #86) > > FWIW, I just noticed that there's a TimeTravelDebugging branch of > > https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I > > haven't looked at their implementation and how it compares to this one. > > I looked a bit at this. It seems like they can take a snapshot of the heap > and then they also maintain an event log that they can replay. An event can > be something like "Date.now called, returned value X". > > Here's a video where they explain how it works (I didn't watch it completely > yet): > > https://channel9.msdn.com/blogs/Marron/Time-Travel-Debugging-for- > JavaScriptHTML It looks like there are some substantial differences between the approach in this bug and the approach described in this video. The most important one is the recording boundary. In this bug we aim to reproduce the behavior of all code compiled as part of the browser (i.e. excluding system libraries which are called into), modulo things like garbage collection and refcounts which aren't replayed exactly. The approach described in the video compartmentalizes the JS engine so that its non-deterministic behaviors and inputs (like event handler invocations) are recorded and replayed, but not other portions of the browser. Late in the video they are able to rewind state and replay the behavior of a JS game that manipulates a canvas element, but if the DOM structure were to change during the course of the recording then I don't think they would be able to replay those changes or to reconstruct the earlier DOM when rewinding.
Attached patch WIP (obsolete) — Splinter Review
Rolled up patch with the last couple weeks of work. This changes how locks and cvars work during replay so that instead of using the native pthreads code, file descriptor pipes are used by threads to signal each other. This avoids making special pthreads related system calls, which don't seem to work reliably on OS X after rewinding process state, and is also more efficient, as threads can signal one another when acquiring locks in the recorded order without needing to acquire a particular highly contended global lock. This also fleshes out the debugger quite a bit, so that state can be inspected while paused at a breakpoint and the UI can be used to rewind or advance across all the points where a breakpoint is hit. The next, well, step is to get the stepping handlers and their rewinding counterparts to work.
Attachment #8730218 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Rolled up patch, rebased to tip. It's been a while, so I'll update the individual patches which go into this one.
Attachment #8705210 - Attachment is obsolete: true
Attachment #8705212 - Attachment is obsolete: true
Attachment #8705213 - Attachment is obsolete: true
Attachment #8705214 - Attachment is obsolete: true
Attachment #8705215 - Attachment is obsolete: true
Attachment #8705217 - Attachment is obsolete: true
Attachment #8705219 - Attachment is obsolete: true
Attachment #8705223 - Attachment is obsolete: true
Attachment #8705225 - Attachment is obsolete: true
Attachment #8705228 - Attachment is obsolete: true
Attachment #8705230 - Attachment is obsolete: true
Attachment #8705231 - Attachment is obsolete: true
Attachment #8705236 - Attachment is obsolete: true
Attachment #8705237 - Attachment is obsolete: true
Attachment #8705244 - Attachment is obsolete: true
Attachment #8705246 - Attachment is obsolete: true
Attachment #8705256 - Attachment is obsolete: true
Attachment #8736310 - Attachment is obsolete: true
Attached patch Part 1: Build system changes (obsolete) — Splinter Review
Attached patch Part 1b: Move LZ4 to NSPR (obsolete) — Splinter Review
The record/replay infrastructure now uses LZ4 to compress records in memory as they are written to disk, so this patch moves the in-tree version of LZ4 from mfbt to nspr.
This patch has the system/library call redirections for Mac and (still incomplete) Windows, and the infrastructure for enabling redirections.
Attached patch Part 6: Parent side changes (obsolete) — Splinter Review
Attached patch Part 7: Unrecorded atomics/locks (obsolete) — Splinter Review
Attached patch Part 12: Replaying process IPC (obsolete) — Splinter Review
Attached patch Part 13: C++ JS debugger changes (obsolete) — Splinter Review
Attached patch Part 14: C++ JS replay debugger (obsolete) — Splinter Review
Attachment #8737849 - Attachment description: Part 15 - Graphics layers/shmem changes for replay IPC → Part 15: Graphics layers/shmem changes for replay IPC
Attached patch WIP (obsolete) — Splinter Review
Rolled up patch updated with the last couple weeks of work. This mainly overhauls how breakpoints are handled when debugging a replaying process, and gets forward and reverse stepping to work. For now, stepping is handled by setting breakpoints at specific spots instead of setting a handler for a specific frame. This should work fine except when a script is invoked recursively. Eventually the replaying process should be able to keep track of frame identity even after rewinding, which would allow setting handlers on the frames themselves and fixing this.
Attachment #8737826 - Attachment is obsolete: true
Attached patch WIP #2 (obsolete) — Splinter Review
Diff on top of the previous patch with the last couple weeks of work. This has various bug fixes and robustness and efficiency improvements. The most important changes are: - PLDHashTables (along with the various classes that are based on them) all have a deterministic iteration order when recording or replaying. This allows removal of all other hashtable related instrumentation in the browser, except for one spot that uses std::set (whose code we can't modify). - Rendered graphics data can be sent both ways between the replaying process and the middleman. This allows the replaying process to access the rendered graphics when needed, and also lets the middleman leverage the snapshot mechanism to restore earlier graphics data when updating compositor state after rewinding. I've started working through recording and replaying the web-platform tests, one hundred or so at a time. Currently we can record and replay all the 2dcontext tests (about 800 tests, or 10% of the suite). Three of them give different pass/fail results from a normal execution, as we currently use the CG draw target when recording/replaying, instead of Skia (though since the backend in use is hidden behind the DrawTargetRecording and never renders anything in the replaying process, fixing this shouldn't be too hard). From here I want to go through the rest of the web-platform tests, both to fix bugs and to see what browser features we can easily support while recording/replaying. With enough work it shouldn't be terribly hard to get all features tested by web-platform to record/replay, but I want to shift away from building out the prototype to handle new features (like webgl) and towards building a, uh, minimum viable product, something that people can start using and testing. Part of that is figuring out which features we do or don't want to support, and cleanly failing to record when dealing with an unsupported feature instead of just crashing somewhere.
Attached patch WIP #2 (obsolete) — Splinter Review
With this patch, all web-platform tests run by mach can be recorded and replayed, with the following exceptions/caveats: - The custom threading used when playing audio/video is not handled yet, so this patch prevents these from being activated when recording. - DOM workers are similarly not allowed to be created. While recording/replaying executions with workers shouldn't be hard to do (except when SharedArrayBuffers are in use), right now the replay debugger only supports JS which runs on the main thread, and if we record/replay executions with workers then we should be able to debug the workers. A couple other technologies aren't tested by web-platform but also need to be disallowed: - WebGL uses a different graphics pipeline from normal rendering. - asm.js creates signal handlers which interfere with the handler used while replaying. Again, per comment 114 it should be possible to get all of these to work, but in the interest of focusing on improving the quality and robustness of record/replay on 'normal' web pages and on porting this stuff to other platforms (i.e. Windows) these restrictions are in place for now.
Attachment #8747462 - Attachment is obsolete: true
Attached patch WIP (obsolete) — Splinter Review
Rolled up patch with all changes as of a few weeks ago. I'll attach a couple patches with new stuff shortly.
Attachment #8739966 - Attachment is obsolete: true
Attachment #8750383 - Attachment is obsolete: true
Attached patch Mac WIP (obsolete) — Splinter Review
Some fit-and-finish changes to the Mac port, going on top of the previous patch. This has some bug fixes and internal improvements, but the main change is to allow triggering record/replay in a new tab from the browser's UI (Tools -> Web Developer -> Record/Replay Execution}, instead of using environment variables at the command line.
Attached patch Windows WIP (obsolete) — Splinter Review
Windows port work from the last couple weeks. This mostly focuses on redirecting windows system library functions, though this is not yet far enough along to record/replay a simple web page. The main neat thing about this patch is that it can hook all functions in a given DLL to make sure that we never enter the DLL except by calling through a function in the DLL (or another system DLL) that has been manually hooked. This gives a much stronger check than in the Mac port for where the recording boundary is placed.
Attached patch Mac WIP (obsolete) — Splinter Review
Some changes to the Mac port from the last few weeks. This is mainly bugfixes and some reorganization, and starts an overhaul of the snapshot mechanism, representing snapshots as old copies of pages instead of diffs between snapshots (this is simpler and takes more memory/disk space, but makes it easy to discard old snapshots and avoid using a ton of disk space in long running executions), as well as doing most snapshot work off the main thread. Most of what I've been working on the last few weeks has been on the Windows port, though, I'll attach a patch for that tomorrow.
Attachment #8757432 - Attachment is obsolete: true
Attached patch Mac WIP (obsolete) — Splinter Review
Updated Mac WIP with a now-functioning overhauled snapshot mechanism. Replay forwards is much faster now, due to the combination of simpler snapshots (old page copies instead of diffs), serializing snapshots off the main thread, and (most importantly) from not taking snapshots on every single compositor update. Right now snapshots are normally taken every 30 frames when going forwards (this will need adjustment based on workload), and when rewinding we first go back and generate all the intermediate snapshots that were skipped over, so that animations are rewound without skipping any frames. This snapshot generation causes the replaying process to stop showing new frames for a few seconds, so is pretty noticeable; rather than masking over this (on long running replays we will probably want to go up to 100 or more frames between snapshots) the UI could show a "Rewinding in progress" message with a progress bar or something when this happens.
Attachment #8763175 - Attachment is obsolete: true
Attached patch Windows WIP (obsolete) — Splinter Review
Updated Windows WIP. This mostly adds a lot of redirections, including some nice clean handling of COM object interfaces, and is I think getting close to being able to replay simple web pages.
Attachment #8757439 - Attachment is obsolete: true
See Also: → 1282039
Blocks: 1282039
See Also: 1282039
Attached patch WIP (obsolete) — Splinter Review
Rolled up and combined patch (for both mac and windows), incorporating the last couple of weeks of work. Windows replaying continues to slowly progress, and this also has some nice fixes for Mac like removing all the manual instrumentation of objective C messages (the core objc_msgSend routine is hooked instead) and a general cleanup and C++ification of the public API.
Attachment #8757431 - Attachment is obsolete: true
Attachment #8763338 - Attachment is obsolete: true
Attachment #8763347 - Attachment is obsolete: true
Attachment #8737828 - Attachment is obsolete: true
Attachment #8737830 - Attachment is obsolete: true
Attachment #8737832 - Attachment is obsolete: true
Attachment #8737833 - Attachment is obsolete: true
Attachment #8737835 - Attachment is obsolete: true
Attachment #8737836 - Attachment is obsolete: true
Attachment #8737837 - Attachment is obsolete: true
Attachment #8737838 - Attachment is obsolete: true
Attachment #8737839 - Attachment is obsolete: true
Attachment #8737841 - Attachment is obsolete: true
Attachment #8737842 - Attachment is obsolete: true
Attachment #8737843 - Attachment is obsolete: true
Attachment #8737844 - Attachment is obsolete: true
Attachment #8737846 - Attachment is obsolete: true
Attachment #8737847 - Attachment is obsolete: true
Attachment #8737848 - Attachment is obsolete: true
Attachment #8737849 - Attachment is obsolete: true
Attachment #8737850 - Attachment is obsolete: true
Attachment #8737851 - Attachment is obsolete: true
Attachment #8737852 - Attachment is obsolete: true
Attachment #8767746 - Attachment is obsolete: true
From now on I'm going to use a bitbucket repo for keeping track of the most recent series of patches for this bug. The link is here: https://bitbucket.org/bhackett1024/mozilla-web-replay/. I've also linked this from the web replay design doc in the URL field of this bug (and would link the repo directly there too if the URL field supported multiple urls). Anyways, the past few weeks I've mostly been experimenting with different ways of handling objects whose destructors have recorded events and are triggered by GC/CC. Up to now these objects have all just been permanently leaked (except for some nsPipe-related streams whose destructors *must* run for the browser to work, and which required some ugly workarounds), which is gross and wasteful and requires some weird instrumentation sprinkled around the browser. I tried making the destructors for these specific objects run at deterministic points, but while that improves things a bit the instrumentation required is even worse. The new plan is a slight redesign of the boundary between the deterministic and non-deterministic parts of the browser. Previously, the GC was non-deterministic, and since references on DOM/COM/etc. objects are released via the GC the destructors of these objects were also non-deterministic. Now, reference releases performed by the GC are deterministic. During recording we keep track of the points when deferred releases are triggered, and schedule them so that they occur at a specific point in the recording which we can find later (rather than wherever the GC itself happens). During replay, GC finalization does not actually release references, but rather the references are released at the same point they were released in the recording. (This design means that during replay some GC objects can hold dangling pointers to already-finalized objects, but since the object will have already been finalized at this point in the recording it never be used again in the replay.) Since non-determinism has been isolated to the GC, objects should be destroyed at the same points during recording and replay and no instrumentation in individual classes is needed.
The latest changesets on the bitbucket repo have some new logic for handling a kind of tricky issue with the debugger. Even when inspecting the JS state of a page, it's pretty easy to get the debugger in the replaying process to try to access parts of the system, e.g. accessing the 'font' property of a 2d canvas context will cause objective C messages to be sent to the system. We can't execute these calls/messages in the replaying process, as they were never performed during the recording. Previously these calls would kill the replaying process, and we were trying to avoid them using some (ad hoc, woefully incomplete) tests. Now, whenever a debugging IPDL message is received that might hit one of these calls/messages, we first take a snapshot. If a call/message is hit later, the snapshot is restored, the IPDL message is responded to with an indication of the failure, and the middleman process (where the debugger server lives) can respond to it appropriately.
Attached patch patch (obsolete) — Splinter Review
Rolled up patch for an initial landing. Only the Mac port works here, and testing is still pretty limited --- though there are (just one right now, soon to be more) automated tests. Structurally the code is in good shape, and I don't see it changing much anytime soon, so landing code now will allow people to try things out, allow fuzzing, and allow integration into the Devtools. Tomorrow I'll post broken up parts of this patch for review, and next I'll mainly be working on finishing the Windows port, maybe the Linux port, and bugfixing. Some information about the overall patch: New code: 32k lines of code in new record/replay files. 10k of this is the udis86 library, most of which is static tables (should have a small compiled code size). JS changes: 1767 insertions, 819 deletions. Most of this is in the Debugger. non-JS changes: 2023 insertions, 608 deletions
The core record/replay/rewind code and public API. The public API is in NSPR, while the implementation is in XUL (toolkit/recordreplay). An alternative here would be to put the API in mfbt, but that is problematic for C code which needs to use the API (this could be worked around).
Add the BSD licensed libudis86 library. Before this lands I'll file a dependent bug for adding this to about:license.
Attached patch Part 1c - Redirections. (obsolete) — Splinter Review
The redirection infrastructure, along with redirections for Mac and Windows.
Initialize and teardown record/replay state. Replaying processes don't exit normally, but rather kill themselves after receiving a Terminate message in the PReplay protocol (in a later patch).
The crash reporter doesn't work correctly while replaying, so this patch disables it while recording/replaying. (This should probably be fixed relatively soon.)
The ordering of atomic operations (mozilla::Atomic or PR_ATOMIC_*) is recorded by default, but this can be avoided (with no associated recording overhead) by using Atomic<AtomicRecording::PreserveNothing> or PR_ATOMIC_*_NO_RECORD.
Change the atomics unit tests to not record/replay activity. Otherwise there are link errors I think.
The chromium windows message pump uses InterlockedExchange for performing some atomic operations, which can't be redirected (it's a compiler intrinsic or something). These operations need to be recorded for consistent replay, so this patch changes these over to PR_ATOMIC.
Allow atomic refcounting classes (RefCounted, AtomicRefCounted) to specify whether to record refcount activity.
Allow xpcom mutex and monitor classes to specify whether their behavior should be recorded.
Don't record refcount adds/releases in nsISupports or nsStringBuffer objects. This is important for avoiding overhead for these classes when not recording/replaying, for reducing recording size, and for avoiding issues where references are directly released from GC finalizers (though there's a separate mechanism for that in a later patch and this might not be an issue anymore).
Don't record some atomic refcounts and other quantities related to graphics surfaces. Because drawing commands are handled in a special way when recording/replaying (see later patches) these quantities might not match up between the two executions.
Lots of JS internal activity does not match up between recording/replay, particularly GC and helper thread behavior and how particular scripts are compiled. This patch avoids recording atomics related to these.
Similarly, use of JS mutexes might not match up between recording/replay. There aren't any JS mutexes whose activity needs to be recorded, so this patch turns off recording for all of them.
Don't record an atomic in the replace-malloc library, to avoid link errors against NSPR.
I think the chaos mode counter can be used directly from GC callbacks, but in any case its behavior doesn't need to be recorded for consistent replay.
Don't record some atomic quantities that NSPR uses internally for condvars and monitors. These quantities aren't related to ensuring consistent record/replay, and if the condvar/monitor is not itself recorded then we don't want to record this ancillary activity either.
The profiler pseudo stack refcounts can be modified while under the GC I think.
Deadlock detectors have an internal lock, and since the detectors can be used with locks that aren't recorded, the internal lock shouldn't be recorded either.
Avoid recording some debugging/statistics atomics, both because doing so isn't necessary and because some of these can be used under the GC I think.
Avoid recording a couple of threading atomics which aren't necessary for consistent record/replay and are accessed under the GC I think.
Avoid recording a monitor which is accessed under the GC.
Incremental GC and finalization aren't yet supported while recording/replaying, mostly because they can happen at different points between the executions and rely on posting runnables for continuing work. Creating and posting runnables requires interacting with parts of the browser whose behavior *is* recorded and replayed exactly. Also, this patch fixes some ways where we could end up performing an incremental GC even if JS::IsIncrementalGCEnabled returns false.
The GC and helper threads keep track of timing for statistics. This patch just causes all these times to be zero when recording/replaying, though an alternative I guess would be to pass through events while getting the current time so that these calls don't interact with the recording.
There are several places where runnables are dispatched from within DOM GC callbacks for continuing GC/CC activity. Since these runnables can't be dispatched from within the GC (for the same reason as incremental GC is disabled) this patch avoids this dispatching when recording or replaying.
Compacting GC can write to a lot of memory and cause large snapshots to be generated while replaying, so this patch disables it in replaying processes. Replaying processes also use raw non-nursery GC pointers as cross-process identifiers (a hack which should be fixed).
PluginUtilsOSX::SetProcessName has a bug where it doesn't check the return value of a CFBundleGetFunctionPointerForName call early enough. When recording or replaying this call always returns null (this is for simplicity, as I haven't seen anywhere we depend on this call returning non-null for important functionality).
The image loader has an array of requests which is sorted by pointer address. Since pointer values can differ between recording and replay, doing this sorting while recording/replaying can cause the replay to diverge from the recording. This patch just doesn't sort the array when recording/replaying, so that requests appear in the array in the order they were inserted. I haven't seen this pattern anywhere else, so haven't tried to generalize it as was done for hash tables.
Finalization witnesses can trigger recorded events / behavior changes from within the GC, so this patch disables them. AFAICT this class isn't used for anything critical (mainly for tests?), but if we do need to preserve its behavior we could do something similar to what happens for nsISupports references released from GC finalizers (see later patch), in which case during replay the witness would be activated at the same point as during the recording (and at a time that may be long after, or even before, the time when the replaying process GC has gotten around to finalizing the associated object.)
Native drawing commands issued directly to the underlying CGContextRef in a DrawTarget are not supported by DrawTargetRecording. Since during replay all drawing is done through a DrawTargetRecording and there is no CGContextRef to speak of, it seems best to just disallow native drawing while recording or replaying.
The replaying process debugger needs scripts to be created in a consistent order, so for simplicity this patch disables lazy and off thread parsing while recording or replaying. Off thread parses also dispatch runnables when they finish, and since off thread parsing happens on JS helper threads they can finish at different points between recording and replay.
The GC nursery collection callback can add information about the GC to some timeline consumer. Doing so can trigger recorded events, so this is disabled when recording or replaying.
Disable runnables which are generated to invoke the debugger after a GC is performed, when recording or replaying.
Since most refcounts are not recorded they might not match up between recording and replay, so this patch disables refcount tracing while recording or replaying.
Disable the background hang monitor while recording or replaying. As for the crash reporter, the hang monitor won't be able to monitor the process state and behave appropriately while replaying an execution.
Disable telemetry while recording or replaying. Telemetry measurements can be added while under the GC.
Media elements are not yet supported while recording or replaying.
DOM workers are not supported yet while recording or replaying.
Accelerated canvases are not supported yet while recording or replaying.
JS signal handlers are not supported yet while recording or replaying.
The slow script dialog is not supported while recording or replaying.
Even though the GC's behavior might differ between recording and replay, other parts of the system should behave consistently. This patch ensures this consistent behavior in three ways: making sure that deferred finalization of GC object references happens at consistent points, making sure that the same set of references is released between recording and replay, and making sure that weak pointers (i.e. from nsWrapperCache) behave the same between recording and replay.
The mMutants set in ShadowLayerForwarder is an std::set whose iteration order may differ between recording and replay. This patch manually instruments uses of this member to ensure consistent behavior, though it's pretty ugly. Alternatives would be to just use a PLDHashTable (which always gives consistent behavior, see later patches) or making a wrapper class like mozilla::set that wraps an std::set and behaves consistently while recording/replaying.
Comment on attachment 8790771 [details] [diff] [review] Part 5a - Disable incremental GC when recording or replaying. Could the parts of this patch that don't involve record/replay be extracted into another patch in a new bug? This sounds really important for Servo, where we currently have incremental GC disabled.
mach_msg is used for both inter-thread and inter-process communication. This patch manually records/replays some calls to this method that are used for IPC, though it might be better to redirect mach_msg itself and intercept all messages that are sent/received between threads as well.
(In reply to Josh Matthews [:jdm] from comment #169) > Comment on attachment 8790771 [details] [diff] [review] > Part 5a - Disable incremental GC when recording or replaying. > > Could the parts of this patch that don't involve record/replay be extracted > into another patch in a new bug? This sounds really important for Servo, > where we currently have incremental GC disabled. Sure, I'll file a new bug once I finish attaching patches here (still a ways to go, alas).
Additional JS instrumentation to mark regions where recorded events should not occur (GC, helper threads, interrupt callback) and where the recording has been invalidated (overrecursion, forced return from the interrupt callback).
Similar to part 8a, manually ensure that hash table iteration behaves consistently when operating on the transferableObjects GCHashSet during structured cloning.
The random number seed generation incorporates the environment and its pointer address. Since this can address vary between recording and replay, don't use the environment in such cases.
Use the record/replay hash table iteration ordering API internally within the PLD and PL hashtable implementations, to ensure that these tables always behave consistently when iterated over by callers.
Graphics code uses locks in shared memory to coordinate management of graphics data. Since the memory can be written by another process, its contents will vary between recording and replay. This patch instruments the classes managing these locks so that the values read while recording will be reproduced during replay.
This patch has the PReplay IPDL protocol (the only protocol by which a middleman and replaying process can communicate) and the parent (middleman) side of the implementation.
Middleman processes have a special PCompositorBridgeChild, MiddlemanCompositorChild, which is used to relay graphics updates from the replaying process to the chrome process. This patch includes external changes necessary to make sure that we don't create a normal CompositorBridgeChild or try to use it to send updates from the middleman's content itself (an about:blank document) to the chrome process.
The middleman process must shut down the replaying process when it itself is being shut down.
The middleman and replaying processes both send messages to each other off the main thread. Some of these messages are synchronous and high priority (to avoid assertions related to normal priority parent->child sync messages) and this runs into an assertion failure in MessageChannel::DispatchSyncMessage. This patch relaxes the assertion.
This patch has the child (replaying process) side of the PReplay protocol.
SharedMemory has an associated memory reporter which is normally created lazily with the first SharedMemory itself. Since replaying processes allocate special shared memory we don't want those memories to interfere with creation of this reporter. This patch adds an API so that the reporter can be created ahead of time at a specific point during both recording and replay.
During replay, the shared memory subsystem can communicate with two other pids. First, there is the original pid which it communicated with while recording, which no longer exists and whose system call results are all replayed from the recording. Second, there is the middleman process' pid which it communicates with for real (thread events are passed through to the system). Communication with these pids happens on different threads, but some state is shared between the two. This patch manages this pretty tricky situation, and also adds an interrupt message so that snapshots can be taken while the thread communicating with the middleman is idle.
This patch adds interrupt logic similar to that used for the shared memory thread, so that the replay specific IPC message pump and message loop threads can go idle and allow a snapshot to be taken.
If it is unable to send an entire message, the IPC message pump can send only part of the message before returning to the outer loop. If we take a snapshot then and restore to it later then we end up sending a torn up message fragment to the middleman, which is bad. This patch avoids returning to the outer loop by busy waiting until the entire outgoing message is sent.
Since JS helper threads never block on a recorded resource, they behave similarly to replay specific threads and need to a special mechanism for being interrupted and going idle so that a snapshot can be taken.
This patch makes sure that communication channels are set up with the middleman process when initializing the replaying process.
Replay specific threads should be invisible to the sampling profiler.
Changes to the C++ JS debugger interface to support using a replay debugger. This includes changes to the debugger object guts (a new REPLAYING slot, different possibilities for the private pointer), changes to the debugger function macros to allow easily forwarding calls to the replay debugger, changes to the handling of scripts to allow for replay-agnostic calls operating on script contents, and refactoring to avoid duplicated logic in the replay debugger. I'm sorry about the size of this patch, unfortunately it was pretty hard to completely separate everything going on here.
This patch has the replay debugger logic, which consists of an object attached to the JS debugger in a C++ process for sending queries to the replaying process state and corresponding logic for answering those queries in the replaying process itself.
This patch has the graphics code using the interfaces in ReplayChild.h to notify the recording/replaying process about IPDL actors and shared memories that are created and compositor updates that are performed, so that the replaying process can operate on these updates and forward them to the middleman process.
This patch has some refactoring to avoid duplicated code in DrawTargetRecordReplay and MiddlemanTranslator.
DrawTargetRecording::GetBackendType() looks wrong to me. Maybe I'm misunderstanding what the backend type here means, but there are a number of places where code checks the backend type of a draw target before static_cast'ing it to the corresponding class (e.g. BorrowedCGContext::BorrowCGContextFromDrawTarget).
The MiddlemanTranslator needs to control the creation of the draw targets that are associated with it, so this patch calls CreateDrawTarget on it directly. I think this is functionally the same as before for PrintTranslator, though maybe it would be better to have a Translator::CreateSimilarDrawTarget?
This patch adds a NativeFontResourceMac class, so that font resources can be created and transported with Mac draw targets and so that DrawTargetRecording can be used on Macs.
Add a RECORDREPLAY graphics backend (which unfortunately has a similar name to the RECORDING backend, despite the two being very different things).
Add a DrawTargetRecordReplay draw target. When recording this wraps a native draw target and passes through thread events whenever calling into it (so that drawing commands are performed without affecting the recording at all). When replaying this does not wrap a draw target at all but reproduces information captured during the recording, and requests rendered data from the middleman when it needs to be flushed.
Have the gfx::Factory create DrawTargetRecordReplay draw targets when recording or replaying, similar to how it already creates DrawTargetRecording targets when necessary.
A large portion (75%+) of recording files can be due to calling into the system to fill in the data associated with fonts. It is a lot more efficient to just record the final generated font data, so this patch manually instruments the font data generation calls to perform this optimization.
Client side changes to the Devtools interface for recording/replaying. This adds new 'Record Execution' and 'Replay Execution' menu items and reverse playing/stepping buttons in the debugger UI. These are gated on the new devtools.recordreplay.enabled being set to true. This pref is off by default, and for now only exists at all on Mac nightly builds.
Add goo so that calling gBrowser.addTab with the right arguments (as the previous patch does) will cause a new recording or middleman process to be spawned and associated with the tab.
Changes to the server side (middleman process) Devtools code for interacting with a replaying debugger.
Attached patch Part 17 - Tests.Splinter Review
Last patch!! This adds a basic record/replay mochitest, though I want to write more tests soon.
Depends on: 1302523
I filed bug 1302523 for the incremental GC issue. Here is an updated patch which goes on top of the one in that bug.
Attachment #8790771 - Attachment is obsolete: true
Attachment #8790736 - Flags: review?(jwalden+bmo)
Attachment #8790738 - Flags: review?(jwalden+bmo)
Attachment #8790745 - Flags: review?(ehsan)
Attachment #8790747 - Flags: review?(nfroyd)
Attachment #8790748 - Flags: review?(nfroyd)
Attachment #8790750 - Flags: review?(bas)
Attachment #8790751 - Flags: review?(jdemooij)
Attachment #8790754 - Flags: review?(nfitzgerald)
Attachment #8790755 - Flags: review?(n.nethercote)
Attachment #8790756 - Flags: review?(nfroyd)
Attachment #8790765 - Flags: review?(nfitzgerald)
Attachment #8790750 - Flags: review?(bas) → review+
Attachment #8790767 - Flags: review?(nfroyd)
Attachment #8790768 - Flags: review?(nfroyd)
Attachment #8790769 - Flags: review?(nfroyd)
Attachment #8790770 - Flags: review?(nfroyd)
Attachment #8790772 - Flags: review?(sphink)
Attachment #8790774 - Flags: review?(continuation)
Attachment #8790775 - Flags: review?(jcoppeard)
Attachment #8790781 - Flags: review?(b56girard)
Attachment #8790786 - Flags: review?(matt.woodrow)
Attachment #8790781 - Flags: review?(b56girard) → review+
Attachment #8790788 - Flags: review?(nfroyd)
Attachment #8790789 - Flags: review?(bas)
Attachment #8790794 - Flags: review?(jdemooij)
Attachment #8790796 - Flags: review?(continuation)
Attachment #8790800 - Flags: review?(nfitzgerald)
Attachment #8790804 - Flags: review?(nfroyd)
Attachment #8790817 - Flags: review?(vladan.bugzilla)
Attachment #8790818 - Flags: review?(gfritzsche)
Attachment #8790821 - Flags: review?(rjesup)
Attachment #8790822 - Flags: review?(bugs)
Attachment #8790823 - Flags: review?(dvander)
Heads up Brian.. Vladan is no longer at mozilla. You may want to find another reviewer for that patch.
Attachment #8790825 - Flags: review?(luke)
Comment on attachment 8790817 [details] [diff] [review] Part 5m - Disable hang monitor while recording or replaying. Thanks Kannan.
Attachment #8790817 - Flags: review?(vladan.bugzilla) → review?(nfroyd)
Attachment #8790826 - Flags: review?(mrbkap)
Attachment #8790827 - Flags: review?(continuation)
Attachment #8790828 - Flags: review?(jmuizelaar)
Attachment #8790834 - Flags: review?(wmccloskey)
Attachment #8790836 - Flags: review?(jdemooij)
Attachment #8790837 - Flags: review?(jorendorff)
Attachment #8790839 - Flags: review?(franziskuskiefer)
Attachment #8790840 - Flags: review?(nfroyd)
Attachment #8790842 - Flags: review?(nical.bugzilla)
Comment on attachment 8790825 [details] [diff] [review] Part 6d - Disable wasm signal handlers when recording or replaying. Review of attachment 8790825 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/asmjs/WasmSignalHandlers.cpp @@ +1287,5 @@ > if (getenv("JS_DISABLE_SLOW_SCRIPT_SIGNALS") || getenv("JS_NO_SIGNALS")) > return false; > > + // Signal handlers are currently disabled when recording or replaying. > + if (PR_IsRecordingOrReplaying()) So this will have the effect of completely disabling wasm + asm.js optimizations on all platforms. Is it really necessary? I know the signal used for interrupts is nondeterministic, but if we dealt with that separately, the wasm/asm.js out-of-bounds handlers are deterministic and so I wouldn't think they'd be a problem. Also, just to make sure: PR_IsRecordingOrReplaying() has an constant value after process startup, right?
(In reply to Luke Wagner [:luke] from comment #207) > So this will have the effect of completely disabling wasm + asm.js > optimizations on all platforms. Is it really necessary? I know the signal > used for interrupts is nondeterministic, but if we dealt with that > separately, the wasm/asm.js out-of-bounds handlers are deterministic and so > I wouldn't think they'd be a problem. The main issue is that replaying processes install their own signal handler for keeping track of memory modified between snapshots. This signal handler needs to cooperate with the wasm signal handler, which shouldn't be terribly hard to do but does make things more complex and seems better to implement after the initial landing. Otherwise, I agree that there shouldn't be a problem with using the wasm signal handler while recording or replaying. > Also, just to make sure: PR_IsRecordingOrReplaying() has an constant value > after process startup, right? Yes.
Comment on attachment 8790843 [details] [diff] [review] Part 9a - PReplay protocol and parent side implementation. This patch has a mix of IPC (communication with the replaying process), compositor updates (take messages from the replaying process and forward them to the chrome process' PCompositorBridgeParent), and draw event replay (read draw events from the replaying process and translate them in the middleman).
Attachment #8790843 - Flags: review?(wmccloskey)
Attachment #8790843 - Flags: review?(nical.bugzilla)
Attachment #8790843 - Flags: review?(bas)
Attachment #8790847 - Flags: review?(nical.bugzilla)
Attachment #8790848 - Flags: review?(wmccloskey)
Attachment #8790851 - Flags: review?(wmccloskey)
Attachment #8790755 - Flags: review?(n.nethercote) → review+
Comment on attachment 8790854 [details] [diff] [review] Part 10a - Child side implementation of PReplay protocol. This patch has the IPC/compositor/graphics logic in the replaying process corresponding to part 9a.
Attachment #8790854 - Flags: review?(wmccloskey)
Attachment #8790854 - Flags: review?(nical.bugzilla)
Attachment #8790854 - Flags: review?(bas)
Comment on attachment 8790825 [details] [diff] [review] Part 6d - Disable wasm signal handlers when recording or replaying. Righto, makes sense.
Attachment #8790825 - Flags: review?(luke) → review+
Attachment #8790856 - Flags: review?(n.nethercote)
Attachment #8790857 - Flags: review?(wmccloskey)
Attachment #8790859 - Flags: review?(wmccloskey)
Attachment #8790856 - Flags: review?(n.nethercote) → review+
Attachment #8790860 - Flags: review?(wmccloskey)
Attachment #8790862 - Flags: review?(nfitzgerald)
Attachment #8790863 - Flags: review?(wmccloskey)
Attachment #8790864 - Flags: review?(b56girard)
Attachment #8790872 - Flags: review?(jorendorff)
Attachment #8790873 - Flags: review?(jorendorff)
Attachment #8790875 - Flags: review?(nical.bugzilla)
Attachment #8790864 - Flags: review?(b56girard) → review+
Attachment #8790877 - Flags: review?(bas)
Attachment #8790880 - Flags: review?(bas)
Attachment #8790882 - Flags: review?(bas)
Attachment #8790884 - Flags: review?(jmuizelaar)
Attachment #8790885 - Flags: review?(bas)
Attachment #8790888 - Flags: review?(bas)
Attachment #8790890 - Flags: review?(bas)
Attachment #8790892 - Flags: review?(bas)
Attachment #8790895 - Flags: review?(ejpbruel)
Attachment #8790896 - Flags: review?(wmccloskey)
Attachment #8790898 - Flags: review?(jimb)
Attachment #8790913 - Flags: review?(continuation)
Attachment #8790786 - Flags: review?(matt.woodrow) → review+
Attachment #8790764 - Flags: review?(wmccloskey)
Attachment #8790740 - Flags: review?(wmccloskey)
Attachment #8790724 - Flags: review?(wmccloskey)
Attachment #8790725 - Flags: review?(jdemooij)
Attachment #8790726 - Flags: review?(jdemooij)
Attachment #8790730 - Flags: review?(wmccloskey)
Attachment #8790734 - Flags: review?(wmccloskey)
Comment on attachment 8790774 [details] [diff] [review] Part 5c - Don't dispatch runnables for GC or finalization when under the GC and recording or replaying. Review of attachment 8790774 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsJSEnvironment.cpp @@ +1968,5 @@ > // static > void > nsJSContext::PokeShrinkingGC() > { > + if (sShrinkingGCTimer || sShuttingDown || PR_IsRecordingOrReplaying()) { Could you please add a static method named like SkipCollection which is just "return sShuttingDown || PR_IsRecordingOrReplaying();" and use that here and in the rest of the file? That would make it a little clearer what is the intent.
Attachment #8790774 - Flags: review?(continuation) → review+
Attachment #8790796 - Flags: review?(continuation) → review+
Comment on attachment 8790822 [details] [diff] [review] Part 6b - Disable DOM workers when recording or replaying. I wouldn't really put PR_IsRecordingOrReplaying() in the constructor, but just hide the interfaces from the global. So, in .webidl have something like Func="SomeMethodName". But if you want to start with this approach which breaks any pages using Workers, fine. Though I guess those pages can't be used with replay anyhow, so shouldn't matter much. PR_IsRecordingOrReplaying() isn't threadsafe, but I guess the idea is that PR_IsRecordingOrReplaying() returns always the same value during the process lifetime.
Attachment #8790822 - Flags: review?(bugs) → review+
Wow, this is one of the most exciting features ever, but the implications in terms of how easy it is to forget to instrument future code in the content process and break recording are very scary.
Attachment #8790788 - Flags: review?(nfroyd) → review+
Attachment #8790804 - Flags: review?(nfroyd) → review+
Attachment #8790817 - Flags: review?(nfroyd) → review+
Comment on attachment 8790840 [details] [diff] [review] Part 8f - Ensure that PL and PLD hashtables have consistent iteration order when recording/replaying. Review of attachment 8790840 [details] [diff] [review]: ----------------------------------------------------------------- I think this is OK, though I have concerns about how much overhead this adds when not recording/replaying: history has told us that this code is perf-sensitive (I can't pull up the specific Android bug I'm thinking about right now--we had startup regressions and essentially worked around it by eliminating an extra branch), and adding a (non-XUL!) global variable check in these paths sounds expensive, at least. (I think that'd be 2 pointer chases on ARM, 2 (?) on x86, but only 1 on x86-64.) Have you done any performance testing of this? r=me assuming we can come to some understanding with the above. Note that I'm not an NSPR peer, so while I think the NSPR changes look OK, somebody else should approve those. ::: nsprpub/lib/ds/plhash.c @@ +260,5 @@ > he->value = value; > he->next = *hep; > *hep = he; > ht->nentries++; > + PR_RecordReplayAddHashTableItem(ht, (size_t) he); Why did you use size_t in this API and related ones rather than void*? @@ +402,5 @@ > + } > + } > + PR_ASSERT(index == nentries); > + > + PR_RecordReplaySortHashTableItems(ht, (size_t*)ordering, ht->nentries); Same kind of question as above. void** here seems better.
Attachment #8790840 - Flags: review?(nfroyd) → review+
Comment on attachment 8790747 [details] [diff] [review] Part 4b - Make recording optional in mozilla mutexes and monitors. Review of attachment 8790747 [details] [diff] [review]: ----------------------------------------------------------------- r=me assuming the explanation for the BaseAutoLock change makes sense. Open to bikeshedding on the enum name and value names. ::: xpcom/glue/Mutex.h @@ +42,5 @@ > * @returns If failure, nullptr > * If success, a valid Mutex* which must be destroyed > * by Mutex::DestroyMutex() > **/ > + explicit OffTheBooksMutex(const char* aName, bool aRecorded = true) Please change the boolean here and in related places to an actual enum, since we're going to be passing literal values in later patches, and that way we can avoid writing /* recorded = */ everywhere because the enum name will do it for us. The AtomicRecording enum would work nicely here, but the name is a bit off. Call the enum RecordSynchronization::Preserve{Ordering,Nothing}? @@ +160,5 @@ > * > * @param aLock A valid mozilla::Mutex* returned by > * mozilla::Mutex::NewMutex. > **/ > + explicit BaseAutoLock(T aLock MOZ_GUARD_OBJECT_NOTIFIER_PARAM) Why are you changing this to require reference parameters in the users of this template? Please change it back or provide an explanation. If the way it was isn't workable, then can we please add: static_assert(mozilla::IsReference<T>::value, "must provide a reference type"); to this class, or the next person who comes along and specializes BaseAutoLock stands a good change of getting things wrong.
Attachment #8790747 - Flags: review?(nfroyd) → review+
Comment on attachment 8790770 [details] [diff] [review] Part 4n - Don't record XPT table monitor. Review of attachment 8790770 [details] [diff] [review]: ----------------------------------------------------------------- r=me with corresponding changes from the Mutex review.
Attachment #8790770 - Flags: review?(nfroyd) → review+
Attachment #8790767 - Flags: review?(nfroyd) → review+
Comment on attachment 8790748 [details] [diff] [review] Part 4c - Don't record activity on ThreadSafeAutoRefCnt or nsStringBuffer refcounts. Review of attachment 8790748 [details] [diff] [review]: ----------------------------------------------------------------- Why does not recording these avoid overhead when we're not recording/replaying? Are we actually calling into the record/replay machinery for every atomic access even when we're not recording/replaying? Or am I mis-parsing your explanation? I am curious whether subsequent patches address the GC finalization bits.
Attachment #8790748 - Flags: review?(nfroyd) → review+
Attachment #8790756 - Flags: review?(nfroyd) → review+
Attachment #8790768 - Flags: review?(nfroyd) → review+
Comment on attachment 8790769 [details] [diff] [review] Part 4m - Don't record some threading atomics. Review of attachment 8790769 [details] [diff] [review]: ----------------------------------------------------------------- Major brownie points for separating out this massive patch into small understandable pieces! Thank you!
Attachment #8790769 - Flags: review?(nfroyd) → review+
Comment on attachment 8790895 [details] [diff] [review] Part 15a - Client side devtools changes. For frontend changes to the debugger James is probably a better reviewer than me.
Attachment #8790895 - Flags: review?(ejpbruel) → review?(jlong)
Attachment #8790840 - Flags: review?(wtc)
Attachment #8790764 - Flags: review?(wmccloskey) → review?(wtc)
Attachment #8790736 - Flags: review?(wtc)
Comment on attachment 8790724 [details] [diff] [review] Part 1a - Record/replay/rewind infrastructure. This patch adds a new NSPR public header.
Attachment #8790724 - Flags: review?(wtc)
Comment on attachment 8790828 [details] [diff] [review] Part 8a - Manually ensure hash table iteration ordering consistency in mMutants std::set. Review of attachment 8790828 [details] [diff] [review]: ----------------------------------------------------------------- I would prefer using some kind of stable set here instead of having to manually instrument. Is there a plan for how we're going to avoid accidentally introducing non-determinism like this in?
(In reply to Nathan Froyd [:froydnj] from comment #215) > Comment on attachment 8790840 [details] [diff] [review] > Part 8f - Ensure that PL and PLD hashtables have consistent iteration order > when recording/replaying. > > Review of attachment 8790840 [details] [diff] [review]: > ----------------------------------------------------------------- > > I think this is OK, though I have concerns about how much overhead this adds > when not recording/replaying: history has told us that this code is > perf-sensitive (I can't pull up the specific Android bug I'm thinking about > right now--we had startup regressions and essentially worked around it by > eliminating an extra branch), and adding a (non-XUL!) global variable check > in these paths sounds expensive, at least. (I think that'd be 2 pointer > chases on ARM, 2 (?) on x86, but only 1 on x86-64.) Have you done any > performance testing of this? > Yeah, hashtable code is super perf sensitive and already too slow (which is why we in some cases have caches on top of it), so yes, please be careful with that.
(In reply to Nathan Froyd [:froydnj] from comment #215) > I think this is OK, though I have concerns about how much overhead this adds > when not recording/replaying: history has told us that this code is > perf-sensitive (I can't pull up the specific Android bug I'm thinking about > right now--we had startup regressions and essentially worked around it by > eliminating an extra branch), and adding a (non-XUL!) global variable check > in these paths sounds expensive, at least. (I think that'd be 2 pointer > chases on ARM, 2 (?) on x86, but only 1 on x86-64.) Have you done any > performance testing of this? I haven't done any performance testing, but I want to make sure this doesn't measurably regress anything. Are there tests I should focus on to make sure the hash table instrumentation doesn't have a perf effect? > ::: nsprpub/lib/ds/plhash.c > @@ +260,5 @@ > > he->value = value; > > he->next = *hep; > > *hep = he; > > ht->nentries++; > > + PR_RecordReplayAddHashTableItem(ht, (size_t) he); > > Why did you use size_t in this API and related ones rather than void*? Well, internally they are treated as size_t's, but I agree the hash table APIs would make more sense using void* and will change them.
(In reply to Brian Hackett (:bhackett) from comment #224) > (In reply to Nathan Froyd [:froydnj] from comment #215) > > I think this is OK, though I have concerns about how much overhead this adds > > when not recording/replaying: history has told us that this code is > > perf-sensitive (I can't pull up the specific Android bug I'm thinking about > > right now--we had startup regressions and essentially worked around it by > > eliminating an extra branch), and adding a (non-XUL!) global variable check > > in these paths sounds expensive, at least. (I think that'd be 2 pointer > > chases on ARM, 2 (?) on x86, but only 1 on x86-64.) Have you done any > > performance testing of this? > > I haven't done any performance testing, but I want to make sure this doesn't > measurably regress anything. Are there tests I should focus on to make sure > the hash table instrumentation doesn't have a perf effect? The autophone tests have often shown Android-specific startup regressions, though I'm not sure how we run prospective patches through those. I'd guess for the cases smaug mentioned, dromaeo, kraken, etc. and anything else we run in talos would be a good start. (I know we run kraken in talos; not sure about dromaeo.)
(In reply to Nathan Froyd [:froydnj] from comment #216) > @@ +160,5 @@ > > * > > * @param aLock A valid mozilla::Mutex* returned by > > * mozilla::Mutex::NewMutex. > > **/ > > + explicit BaseAutoLock(T aLock MOZ_GUARD_OBJECT_NOTIFIER_PARAM) > > Why are you changing this to require reference parameters in the users of > this template? Please change it back or provide an explanation. This change was necessary because of AnyStaticMutex. This class is implicitly constructed and can't be passed by non-const reference to calls. > If the way it was isn't workable, then can we please add: > > static_assert(mozilla::IsReference<T>::value, "must provide a reference > type"); > > to this class, or the next person who comes along and specializes > BaseAutoLock stands a good change of getting things wrong. Unfortunately, it looks like IsReference<AnyStaticMutex> will return false, for the same reason I had to change the T& to a T. Even without this static assertion, though, we should get compile/link errors in the typical way the class could be misused (doing BaseAutoLock<T> instead of BaseAutoLock<T&> for a typical mutex class T) because mutex classes generally prohibit copy constructors / assignments. That is the case for all the other classes which BaseAutoLock is used with.
(In reply to Nathan Froyd [:froydnj] from comment #218) > Comment on attachment 8790748 [details] [diff] [review] > Part 4c - Don't record activity on ThreadSafeAutoRefCnt or nsStringBuffer > refcounts. > > Review of attachment 8790748 [details] [diff] [review]: > ----------------------------------------------------------------- > > Why does not recording these avoid overhead when we're not > recording/replaying? Are we actually calling into the record/replay > machinery for every atomic access even when we're not recording/replaying? > Or am I mis-parsing your explanation? For Atomic<PreserveNothing> there is no overhead at all with operations on the atomic values. For Atomic<PreserveOrdering> every operation will call PR_Begin/EndOrderedAtomicAccess, which in a process that is not recording/replaying will each boil down to a test of the gPRIsRecordingOrReplaying global variable. > I am curious whether subsequent patches address the GC finalization bits. Part 7 is the main patch that addresses deterministic GC finalization, though there might still be places where references are directly released from GC finalizers without going through a deferred finalizer.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #222) > Comment on attachment 8790828 [details] [diff] [review] > Part 8a - Manually ensure hash table iteration ordering consistency in > mMutants std::set. > > Review of attachment 8790828 [details] [diff] [review]: > ----------------------------------------------------------------- > > I would prefer using some kind of stable set here instead of having to > manually instrument. Is there a plan for how we're going to avoid > accidentally introducing non-determinism like this in? The best way of avoiding this non-determinism is to use hash table classes that always behave consistently when recording/replaying. Right now that includes all tables based on PLDHashTable or PLHashTable, but it does not include std::set. Would it be OK to change mMutants to be a nsDataHashtable, or should I make a wrapper for std::set that behaves consistently?
Comment on attachment 8790827 [details] [diff] [review] Part 7 - Ensure deterministic interaction of GC with CC and object references. Review of attachment 8790827 [details] [diff] [review]: ----------------------------------------------------------------- Olli should review this part. He's more aware of the perf implications of wrapper cached stuff than I am.
Attachment #8790827 - Flags: review?(continuation) → review?(bugs)
Sorry, I don't entirely understand why incremental GC can't be done. You said you can't post a runnable during a GC, but why is that? Is the scheduling of IGC slices too nondeterministic? How is the scheduling of the start of a GC any better? How is the CC not affected? Or is the basic idea here to just ignore whatever the GC and CC do and have the replay code schedule the destruction of the objects?
(In reply to Brian Hackett (:bhackett) from comment #228) > Would it be OK to change mMutants to be a > nsDataHashtable, or should I make a wrapper for std::set that behaves > consistently? Changing mMutants to a nsDataHashtable should be fine.
Are PR_IsRecordingOrReplaying()/PR_IsRecording() out of line function calls? If so, I expect they would regress binding performance quite noticeably in GetWrapperJSObject(). SetWrapper() might be a bit less perf-sensitive....
(In reply to Boris Zbarsky [:bz] (busy, pun intended) from comment #232) > Are PR_IsRecordingOrReplaying()/PR_IsRecording() out of line function calls? > If so, I expect they would regress binding performance quite noticeably in > GetWrapperJSObject(). SetWrapper() might be a bit less perf-sensitive.... They are inlined (see part 1a), but they are checking non-local-to-XUL boolean variables, which are more expensive.
Ah. Would need to measure to say anything useful about that. :(
Comment on attachment 8790884 [details] [diff] [review] Part 14d - Support creating native font resources for Mac draw targets. Review of attachment 8790884 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/ScaledFontMac.h @@ +76,5 @@ > + > + NativeFontResourceMac(uint8_t* aData, size_t aSize) > + : mSize(aSize) > + { > + mData = (uint8_t*) malloc(aSize); It doesn't look mData ever gets freed.
Attachment #8790884 - Flags: review?(jmuizelaar) → review-
(In reply to Andrew McCreight [:mccr8] from comment #230) > Sorry, I don't entirely understand why incremental GC can't be done. You > said you can't post a runnable during a GC, but why is that? Is the > scheduling of IGC slices too nondeterministic? How is the scheduling of the > start of a GC any better? How is the CC not affected? Or is the basic idea > here to just ignore whatever the GC and CC do and have the replay code > schedule the destruction of the objects? The scheduling of incremental GC slices and non-incremental GCs are both non-deterministic. The problem is that posting a runnable requires interacting with many parts of the system that need to behave deterministically, particularly the main thread's message loop. If we add a way to schedule and then execute runnables that might be non-deterministic between recording and replay, we should be able to perform incremental GC while recording or replaying. During replay we ignore what the GC is doing wrt objects that hold references on nsISupports things or other references that require deferred finalization. Instead, the references which were released while recording are released at the same time during replay. The cycle collector should run at the same times during recording and replay, and since it sees the same object graph should remove the same cycles.
Comment on attachment 8790828 [details] [diff] [review] Part 8a - Manually ensure hash table iteration ordering consistency in mMutants std::set. Review of attachment 8790828 [details] [diff] [review]: ----------------------------------------------------------------- Let's just use the hash table that does this for us.
Attachment #8790828 - Flags: review?(jmuizelaar) → review-
(In reply to Nathan Froyd [:froydnj] from comment #225) > (In reply to Brian Hackett (:bhackett) from comment #224) > > (In reply to Nathan Froyd [:froydnj] from comment #215) > > > I think this is OK, though I have concerns about how much overhead this adds > > > when not recording/replaying: history has told us that this code is > > > perf-sensitive (I can't pull up the specific Android bug I'm thinking about > > > right now--we had startup regressions and essentially worked around it by > > > eliminating an extra branch), and adding a (non-XUL!) global variable check > > > in these paths sounds expensive, at least. (I think that'd be 2 pointer > > > chases on ARM, 2 (?) on x86, but only 1 on x86-64.) Have you done any > > > performance testing of this? > > > > I haven't done any performance testing, but I want to make sure this doesn't > > measurably regress anything. Are there tests I should focus on to make sure > > the hash table instrumentation doesn't have a perf effect? > > The autophone tests have often shown Android-specific startup regressions, > though I'm not sure how we run prospective patches through those. I'd guess > for the cases smaug mentioned, dromaeo, kraken, etc. and anything else we > run in talos would be a good start. (I know we run kraken in talos; not > sure about dromaeo.) To run the autophone "t" tests on try (which I think is what you want in this case) you can use this try syntax: try: -b o -p android-api-15 -u autophone-s1s2 -t none
FWIW, some of this stuff here, like GetWrapperJSObject() probably needs microbenchmarking. Talos wouldn't tell anything useful, it is too high level.
Attachment #8790826 - Flags: review?(mrbkap) → review+
Comment on attachment 8790895 [details] [diff] [review] Part 15a - Client side devtools changes. Review of attachment 8790895 [details] [diff] [review]: ----------------------------------------------------------------- This makes changes to the old debugger, which is being deprecated. New features should all go into the new debugger which is on github: https://github.com/devtools-html/debugger.html I'm more than happy to help make the changes and show you how to get it running in Firefox, it's very easy. The changes to the files in `browser/base/content` look fine though, although I don't know how those menu items work, but it seems like simple changes.
Attachment #8790895 - Flags: review?(jlong) → review-
Comment on attachment 8790913 [details] [diff] [review] Part 5a - Disable incremental GC when recording or replaying. Review of attachment 8790913 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsgc.cpp @@ +7236,5 @@ > > JS_PUBLIC_API(bool) > JS::IsIncrementalGCEnabled(JSContext* cx) > { > + return cx->gc.isIncrementalGCEnabled() && !PR_IsRecordingOrReplaying(); Why is this here and not in isIncrementalGCEnabled()? I guess it doesn't matter as this is the only caller...
Attachment #8790913 - Flags: review?(continuation) → review+
@bhackett There was a frontend devtools patch that I r-'ed above. This bug already is huge, and I think it would be best if you landed this feature without any frontend changes. After all of the implementation lands, we can land the UI to expose it in a separate bug. That way this bug is more focused, and it'll be easier to help you land the frontend changes without the discussion about backend details. I say this also because we're developing the debugger outside of m-c now, so it'll be easier to coordinate those changes in a more focused bug. We also need to work with out designer to implement the right UX.
(In reply to James Long (:jlongster) from comment #242) As a procedural question, separating out the front-end patches into their own bug seems like a good idea. Regarding UI, we've already begun discussing possibilities with Helen, but we were waiting to get something in-tree on the server side to understand exactly how the new server behaves before developing the UI in detail. I think it would be extremely valuable for people (not least us!) to have some way to experiment with WebReplay before the fully-designed UI is ready. Could we land rough work behind a pref? What's the best way to make this happen?
Flags: needinfo?(jlong)
Attachment #8790794 - Flags: review?(jdemooij) → review+
Comment on attachment 8790836 [details] [diff] [review] Part 8c - Mark places in the JS engine where recording events are disallowed and where the recording should be invalidated. Review of attachment 8790836 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/gc/Marking.cpp @@ +3006,5 @@ > > JS_FRIEND_API(bool) > JS::UnmarkGrayGCThingRecursively(JS::GCCellPtr thing) > { > + mozilla::recordreplay::AutoDisallowThreadEvents d; A comment here would be nice. Do we have to do this also in UnmarkGrayShapeRecursively and UnmarkGrayCellRecursively? ::: js/src/jscntxt.cpp @@ +258,5 @@ > * like fuzzers. > */ > fprintf(stderr, "ReportOverRecursed called\n"); > #endif > + PR_InvalidateRecording("OverRecursed exception thrown"); Also invalidate in js::ReportOutOfMemory?
Attachment #8790836 - Flags: review?(jdemooij) → review+
Comment on attachment 8790751 [details] [diff] [review] Part 4e - Don't record various JS atomics. Review of attachment 8790751 [details] [diff] [review]: ----------------------------------------------------------------- r=me with comment below addressed. ::: js/src/jit/MIRGenerator.h @@ +176,5 @@ > bool shouldForceAbort_; // Force AbortReason_Disable > ObjectGroupVector abortedPreliminaryGroups_; > bool error_; > + HelperThreadAtomicBool* pauseBuild_; > + HelperThreadAtomicBool cancelBuild_; This patch changes these atomics (and some in HelperThreads.h) from Relaxed to SequentiallyConsistent. Some of these flags are tested in loops and might be perf-sensitive. Let's not change the memory ordering in this bug to avoid hard to track down performance regressions.
Attachment #8790751 - Flags: review?(jdemooij) → review+
Comment on attachment 8790725 [details] [diff] [review] Part 1b - Add libudis86 source. Review of attachment 8790725 [details] [diff] [review]: ----------------------------------------------------------------- As you said, this still has to be added to about:license (request review from gerv).
Attachment #8790725 - Flags: review?(jdemooij) → review+
Comment on attachment 8790726 [details] [diff] [review] Part 1c - Redirections. Review of attachment 8790726 [details] [diff] [review]: ----------------------------------------------------------------- Sorry, I'm not very comfortable signing off on this. I don't know much about the low-level Darwin/Windows bits and non-SM code has its own style, conventions, etc. Nathan, maybe you can take a look or know who's a good reviewer? ::: toolkit/recordreplay/ProcessRedirect.cpp @@ +33,5 @@ > + return (uint8_t*)((size_t)aPtr & ~(PageSize - 1)); > +} > + > +static void > +MakeMemoryAccessible(uint8_t* aAddress, size_t aSize) Can we use an Auto class here to make the memory read-write and then in the destructor change to read-execute? ::: toolkit/recordreplay/ProcessRedirect.h @@ +35,5 @@ > + > + // Function for which calls should be redirected to targetFunction. > + uint8_t* mOriginalFunction; > + > + // Record/replay function with the same signature as originalFunction. s/originalFunction/mOriginalFunction/, also below @@ +74,5 @@ > +} > + > +// Workaround limits in C++ on casting between function pointers and data pointers. > +template <typename To> To CastFromVoid(void* aFrom) { return *reinterpret_cast<To*>(&aFrom); } > +template <typename From> void* CastToVoid(From aFrom) { return *reinterpret_cast<void**>(&aFrom); } JS_FUNC_TO_DATA_PTR and JS_DATA_TO_FUNC_PTR just do a mozilla::BitwiseCast<T>. @@ +82,5 @@ > + > +struct AutoRecordReplayFunctionVoid > +{ > + Thread* mThread; > + ErrorType mError; Please add a brief comment for each of these members. @@ +89,5 @@ > + : mThread(Thread::CurrentMaybePassedThrough(false)), > + mError(0), mCallId(aCallId), mCallName(aCallName) > + { > + if (mThread) > + mThread->SetPassThrough(true); Nit: Gecko style wants braces here and elsewhere @@ +475,5 @@ > +extern Atomic<size_t, SequentiallyConsistent, AtomicRecording::PreserveNothing> gMemoryLeakBytes; > + > +template <typename T> > +static inline T* > +NewLeakyArray(size_t aSize) This needs a comment at least.
Attachment #8790726 - Flags: review?(jdemooij) → review?(nfroyd)
Comment on attachment 8790821 [details] [diff] [review] Part 6a - Disable media elements when recording or replaying. Review of attachment 8790821 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/html/HTMLMediaElement.cpp @@ +408,5 @@ > RefPtr<HTMLMediaElement> element; > element.swap(mElement); > > + // Media element playback is not currently supported when recording or > + // replaying. I'm fine with this for now, but let's file a followup bug to design and implement some emulation/simulation (in a replay-able form) of a media element. roc can probably provide some ideas to start with on what would be feasible. ::: dom/media/webaudio/AudioContext.cpp @@ +176,5 @@ > AudioContext::Constructor(const GlobalObject& aGlobal, > AudioChannel aChannel, > ErrorResult& aRv) > { > + // Audio playback is not yet supported when recording or replaying. File a separate followup for this ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp @@ +1361,5 @@ > { > PC_AUTO_ENTER_API_CALL(false); > MOZ_ASSERT(aRetval); > > + if (PR_IsRecordingOrReplaying()) { And file a followup for this, though this will be rather tough to do - but you might be able to mimic a peerconnection well enough to allow debugging code with it. Not a real connection of course. CC myself, drno, bwc, pkerr, dminor and mreavy
Attachment #8790821 - Flags: review?(rjesup) → review+
(In reply to Jim Blandy :jimb from comment #243) > I think it would be extremely valuable for people (not least us!) to have > some way to experiment with WebReplay before the fully-designed UI is ready. > Could we land rough work behind a pref? What's the best way to make this > happen? Yeah, we can do that. It looks like this will be behind the `devtools.recordreplay.enabled` pref, which will be default off? If so, we can go ahead and add some basic UI to the new debugger behind this pref and even land it, even if this hasn't landed it. When this lands, the UI will be ready to go.
Flags: needinfo?(jlong)
Comment on attachment 8790839 [details] [diff] [review] Part 8e - Don't incorporate environment into random number seed when recording or replaying. Review of attachment 8790839 [details] [diff] [review]: ----------------------------------------------------------------- This file is a mess and should be rewritten soonish. While the change is ok in general this requires a new NSPR release that would be required by NSS. Since that's a bigger thing please separate out those things, i.e. an NSS bug that's blocked by the NSPR changes. Then we can track progress there and make sure things get done the right way.
Attachment #8790839 - Flags: review?(franziskuskiefer) → review-
I might be misunderstanding the big picture, here, but it looks like WebReplay records absolutely everything in the content process. All graphics operations are going in the DrawTarget recording, infra, all atomic reads and writes that graphics do to synchronize with the compositor are instrumented, etc. What surprises me is that graphics (canvas aside) is purely a result of layout. If you can replay layout, you should be able to replay graphics without recording it. Layout itself can probably be replayed from a limited set of inputs. Is it possible to reduce the amount of things that need to be recorded and replayed in order for this to work under this design? I am very worried that this amount of instrumentation may be fragile and may become a large maintenance burden on everything that lives in the content process. The bug title says "prototype". Are these patches intended to land or are they more intended as requests for feedback?
Flags: needinfo?(bhackett1024)
(In reply to James Long (:jlongster) from comment #242) > @bhackett There was a frontend devtools patch that I r-'ed above. This bug > already is huge, and I think it would be best if you landed this feature > without any frontend changes. After all of the implementation lands, we can > land the UI to expose it in a separate bug. That way this bug is more > focused, and it'll be easier to help you land the frontend changes without > the discussion about backend details. > > I say this also because we're developing the debugger outside of m-c now, so > it'll be easier to coordinate those changes in a more focused bug. We also > need to work with out designer to implement the right UX. Hi, my main concern is that without some changes to the debugger client, a lot of the features in this bug can't be tested at all. The automated test in part 17 needs new functionality from the client to detect when the replay has finished, and future automated tests that exercise rewinding etc. will need to use the debugger. I guess I don't see what's so bad about landing some changes to the existing deprecated client when it is the only in-tree way to access most of what this bug is doing. All user visible changes to the UI which this bug makes are gated on the devtools.recordreplay.enabled pref, which is, yeah, off by default (and for now only available at all on nightly mac builds).
(In reply to Nicolas Silva [:nical] from comment #251) > I might be misunderstanding the big picture, here, but it looks like > WebReplay records absolutely everything in the content process. All graphics > operations are going in the DrawTarget recording, infra, all atomic reads > and writes that graphics do to synchronize with the compositor are > instrumented, etc. Not quite everything in the content process is recorded/replayed exactly. Besides the allowed non-determinism discussed in the design document (see the URL field at the top of the bug) we generally replay all behavior except for what goes on under system/library calls which we have redirected. There is an exception for draw targets, though. While we are replaying we never create a native draw target (e.g. DrawTargetSkia) but rather create a DrawTargetRecording which wraps a DrawTargetRecordReplay (which wraps nothing). The recorded draw events are sent to the middleman process for rendering in a native draw target, which then sends the rendered data back to the replaying process in case any content needs it. > What surprises me is that graphics (canvas aside) is purely a result of > layout. If you can replay layout, you should be able to replay graphics > without recording it. Layout itself can probably be replayed from a limited > set of inputs. Yes, layout and the commands issued to draw targets are being replayed exactly. Most of the gfx changes are related to the special record/replay draw targets described above. > Is it possible to reduce the amount of things that need to be recorded and > replayed in order for this to work under this design? As far as graphics code goes, yes, before the patch in comment 81 we did not modify the draw target code at all and just redirected calls to native drawing functions (e.g. CGContextDrawImage). But since even then the replaying process does not reproduce what goes on inside those native drawing functions, we need some way of getting the rendered graphics data. Just including the graphics data in the recording consumes an enormous amount of space (per comment 77, up to 95% of the recording on wikipedia). We could redirect the native drawing functions and then replay them in the middleman process, which is what the patches in comments 77 and 80 do, but this approach is effectively duplicating what DrawTargetRecording does, except in a platform specific way, so later patches started using DrawTargetRecording. > I am very worried that this amount of instrumentation may be fragile and may > become a large maintenance burden on everything that lives in the content > process. Which instrumentation are you mainly concerned about? I've been trying to minimize the amount of instrumentation needed (FWIW it used to be a lot worse, especially for hash tables and objective C messages), but there are more steps I can take to reduce the changes to existing code. Like the draw target issue that motivated the redesign in comment 81, though, such steps might end up duplicating functionality within the code base and not really be good design improvements. > The bug title says "prototype". Are these patches intended to land or are > they more intended as requests for feedback? These patches are intended to land. I'll change the bug title.
Flags: needinfo?(bhackett1024)
Summary: Build a web replay prototype → Web replay initial landing
Attachment #8790823 - Flags: review?(dvander) → review+
Comment on attachment 8790818 [details] [diff] [review] Part 5n - Don't perform telemetry while recording or replaying. Review of attachment 8790818 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/TelemetryHistogram.cpp @@ +1726,5 @@ > // it is used, and (2) because it is never de-initialised, and > // a normal Mutex would show up as a leak in BloatView. StaticMutex > // also has the "OffTheBooks" property, so it won't show as a leak > // in BloatView. > +static StaticMutexNotRecorded gTelemetryHistogramMutex; Do we need to do the same for TelemetryScalar.cpp? @@ +1762,5 @@ > MOZ_ASSERT(!gInitDone, "TelemetryHistogram::InitializeGlobalState " > "may only be called once"); > > + gCanRecordBase = canRecordBase && !PR_IsRecordingOrReplaying(); > + gCanRecordExtended = canRecordExtended && !PR_IsRecordingOrReplaying(); What problems specifically is this solving? (1) That we shouldn't collect Telemetry from the record/replay content processes? Or (2) does it solve specific implementation issues with the patch here? I assume we should do (1), as rr content processes seem to be a special case that would bias our measurements. We should do that one level higher in Telemetry.cpp instead of here though, to also cover TelemetryScalar etc.
Attachment #8790818 - Flags: review?(gfritzsche)
Attachment #8790724 - Flags: review?(wtc)
Attachment #8790736 - Flags: review?(wtc)
Attachment #8790764 - Flags: review?(wtc)
Attachment #8790840 - Flags: review?(wtc)
Attachment #8790837 - Flags: review?(jorendorff) → review+
(In reply to Brian Hackett (:bhackett) from comment #253) > Not quite everything in the content process is recorded/replayed exactly. From your answer the situation is not as scary as I thought it was. There are still things that I don't understand. > Yes, layout and the commands issued to draw targets are being replayed > exactly. Most of the gfx changes are related to the special record/replay > draw targets described above. Ok, so graphics is not actually recorded and replayed. The replaying process replays some inputs which eventually cause layout to be calculated and graphics is naturally executed as a result of that. The use of DrawTargetRecording is purely to remote the actual drawing to the middleman process, and not actually part of the recording. Am I right? > Which instrumentation are you mainly concerned about? We should be able to treat gfx as a black box. Display lists come in, layers transactions come out, and the underlying non-determinism (let's say some things are being processed in parallel) should not have any observable effect (if there are observable effects that's likely a bug that we should fix). I don't think you need to instrument the shared memory read locks, for instance (Attachment 8790842 [details] [diff]). These are used internally to see if we can eagerly recycle an already allocated shared texture or if the compositor is still reading from it. During replay, if timings play out differently, a new buffer will be allocated where a previously allocated buffer would have been reused, but the same graphics will be rendered in the end, and the differences in the execution are happening at a level that is not observable by web devs using the tool. Likewise, Attachment 8790875 [details] [diff] seems to indicate that this work relies on layers shmem allocations happening identically in the recording and replaying phases (I think?). We should be fine just replaying shmem (and everything else layers-related) differently as long as what originated caused rendering to happen comes in with the same values.
(In reply to Nicolas Silva [:nical] from comment #255) > Ok, so graphics is not actually recorded and replayed. The replaying process > replays some inputs which eventually cause layout to be calculated and > graphics is naturally executed as a result of that. The use of > DrawTargetRecording is purely to remote the actual drawing to the middleman > process, and not actually part of the recording. Am I right? Yes. > We should be able to treat gfx as a black box. Display lists come in, layers > transactions come out, and the underlying non-determinism (let's say some > things are being processed in parallel) should not have any observable > effect (if there are observable effects that's likely a bug that we should > fix). > > I don't think you need to instrument the shared memory read locks, for > instance (Attachment 8790842 [details] [diff]). These are used internally to > see if we can eagerly recycle an already allocated shared texture or if the > compositor is still reading from it. During replay, if timings play out > differently, a new buffer will be allocated where a previously allocated > buffer would have been reused, but the same graphics will be rendered in the > end, and the differences in the execution are happening at a level that is > not observable by web devs using the tool. > > Likewise, Attachment 8790875 [details] [diff] seems to indicate that this > work relies on layers shmem allocations happening identically in the > recording and replaying phases (I think?). We should be fine just replaying > shmem (and everything else layers-related) differently as long as what > originated caused rendering to happen comes in with the same values. Unfortunately, this would introduce a level of non-determinism which we wouldn't be able to cope with. There are more details under 'Allowed Non-determinism' in the design document, but we want the replay to be more deterministic than what is required for preserving observable behaviors for web developers. Problems with allowing more non-determinism mainly stem from how we replay system/library calls which the replaying content makes. Allocating a shared memory buffer will require the replaying content to communicate with another process over IPDL, calling sendmsg, recvmsg, and other system functions. When we are replaying, the gfx IPDL messages are not actually communicating with another process, and when these system calls are invoked we simply replay what happened during the original recording. If these calls are different during the replay than they were during the recording, we won't know what to do.
Comment on attachment 8790724 [details] [diff] [review] Part 1a - Record/replay/rewind infrastructure. Canceling review of this patch for now. I didn't realize the ramifications of making changes to NSPR, so will try to minimize changes there and move the main public API to mfbt (JS needs it, so it can't be in XUL headers). I'll also experiment with making the isRecordingOrReplaying static boolean a local symbol to whatever library is testing it, and with reducing hashtable overhead. In any case, other patches (and the rest of this patch) won't change other than for the renaming of API functions (PR_RecordReplaySomething -> mozilla::recordreplay::Something).
Attachment #8790724 - Flags: review?(wmccloskey)
(In reply to Jan de Mooij [:jandem] from comment #247) > Comment on attachment 8790726 [details] [diff] [review] > Part 1c - Redirections. > > Review of attachment 8790726 [details] [diff] [review]: > ----------------------------------------------------------------- > > Sorry, I'm not very comfortable signing off on this. I don't know much about > the low-level Darwin/Windows bits and non-SM code has its own style, > conventions, etc. Nathan, maybe you can take a look or know who's a good > reviewer? Sorry, I thought I replied to this yesterday, but I must have forgotten to hit submit or something. I can at least give things a good once-over, but I won't be able to give this the chunk of time it deserves until early next week.
(In reply to Brian Hackett (:bhackett) from comment #256) > (In reply to Nicolas Silva [:nical] from comment #255) > > Ok, so graphics is not actually recorded and replayed. The replaying process > > replays some inputs which eventually cause layout to be calculated and > > graphics is naturally executed as a result of that. The use of > > DrawTargetRecording is purely to remote the actual drawing to the middleman > > process, and not actually part of the recording. Am I right? > > Yes. Ok great. > > Unfortunately, this would introduce a level of non-determinism which we > wouldn't be able to cope with. [...] This is unfortunate. If we do this it means we'll have to strive to remove as much logic from the content process as possible and this type of constraint will give us more to worry about as we add parallelism to our code à la servo. We should look into the configurations that limit the dependencies on ipc, cross-process atomics and the likes. There are certainly a few knobs we can turn on the gfx side to use the slower but simpler paths during recording and replay.
(In reply to Nicolas Silva [:nical] from comment #259) > This is unfortunate. If we do this it means we'll have to strive to remove > as much logic from the content process as possible and this type of > constraint will give us more to worry about as we add parallelism to our > code à la servo. I don't understand your concerns. The presence of the record/replay system shouldn't affect plans for adding parallelism anywhere in m-c. We can record/replay parallel code as easily as sequential code, provided it is race free. Rust code isn't handled yet, but as soon as (or before) it is used in m-c I will fix that; I haven't looked at Rust's internals but as long as it uses the standard pthreads etc. primitives for synchronization the only required changes should be for instrumenting atomics, as in part 2a of the patches here. As things stand now, the only instrumentation anywhere in gfx which is necessary for correct replay is for the shared memory atomics. This instrumentation is straightforward.
Comment on attachment 8790827 [details] [diff] [review] Part 7 - Ensure deterministic interaction of GC with CC and object references. > nsWrapperCache::CheckCCWrapperTraversal(void* aScriptObjectHolder, > nsScriptObjectTracer* aTracer) > { >- JSObject* wrapper = GetWrapper(); >+ JSObject* wrapper = GetWrapperForGC(); this is major change to behavior. GetWrapper() does unmark graying, but this new GetWrapperForGC does not. But ok, this case this should be fine. > nsWrapperCache() : mWrapper(nullptr), mFlags(0) > { >+ PR_RecordReplayRegisterWeakPointer(this); > } > ~nsWrapperCache() > { >+ PR_RecordReplayUnregisterWeakPointer(this); What do these new PR_ methods do? I don't see they definition in the full patch. This code can be a bit hot in some microbenchmarks, so I'd like to see what the methods actually do. >+ /** >+ * Get the underlying wrapper object for use during GC. >+ */ >+ JSObject* GetWrapperForGC() const >+ { >+ return mWrapper; >+ } Please don't add this. We really should use GetWrapperPreserveColor(), since its name indicates clearly how it is different to GetWrapper(). Or at least call it GetWrapperPreserveColorForGC() or some such. Though, it is totally unclear when this method should be used and when GetWrapperPreserveColor(), so this needs some documentation about expected usage. >+ static void RecordReplayRun(void* aObj, void*, void*) >+ { >+ AsyncFreeSnowWhite* freer = reinterpret_cast<AsyncFreeSnowWhite*>(aObj); >+ freer->mActive = PR_RecordReplayValue(freer->mActive); >+ freer->mPurge = PR_RecordReplayValue(freer->mPurge); >+ freer->mContinuation = PR_RecordReplayValue(freer->mContinuation); >+ freer->Run(); >+ } >+ > void Dispatch(bool aContinuation = false, bool aPurge = false) > { > if (mContinuation) { > mContinuation = aContinuation; > } > mPurge = aPurge; >- if (!mActive && NS_SUCCEEDED(NS_DispatchToCurrentThread(this))) { >- mActive = true; >+ if (!mActive) { >+ if (PR_IsRecordingOrReplaying()) { >+ PR_RecordReplayActivateTrigger(this); >+ mActive = true; >+ } else { >+ if (NS_SUCCEEDED(NS_DispatchToCurrentThread(this))) { >+ mActive = true; >+ } >+ } > } > } > >- AsyncFreeSnowWhite() : mContinuation(false), mActive(false), mPurge(false) {} >+ AsyncFreeSnowWhite() : mContinuation(false), mActive(false), mPurge(false) >+ { >+ PR_RecordReplayRegisterTrigger(this, RecordReplayRun, nullptr, nullptr); >+ } >+ >+ ~AsyncFreeSnowWhite() >+ { >+ PR_RecordReplayUnregisterTrigger(this); >+ } To keep this code somehow maintainable, the PR_Record stuff needs comments. Why they are here and what they are doing. That applies to the whole patch. >+++ b/xpcom/base/nsCycleCollector.cpp >@@ -3824,16 +3824,19 @@ nsCycleCollector::BeginCollection(ccType > > // BeginCycleCollectionCallback() might have started an IGC, and we need > // to finish it before we run FixGrayBits. > FinishAnyIncrementalGCInProgress(); > timeLog.Checkpoint("Pre-FixGrayBits finish IGC"); > > FixGrayBits(forceGC, timeLog); > >+ if (PR_IsRecordingOrReplaying()) >+ PR_RecordReplayExecuteTriggers(); Nit, always {} with 'if' (js/* may still use unusual coding style, but this is xpcom) >+++ b/xpcom/threads/nsThread.cpp >@@ -989,16 +989,20 @@ void canary_alarm_handler(int signum) > PR_END_MACRO > > NS_IMETHODIMP > nsThread::ProcessNextEvent(bool aMayWait, bool* aResult) > { > LOG(("THRD(%p) ProcessNextEvent [%u %u]\n", this, aMayWait, > mNestedEventLoopDepth)); > >+ if (PR_IsRecordingOrReplaying() && NS_IsMainThread()) { Use (mIsMainThread == MAIN_THREAD) and not NS_IsMainThread(), and put then PR_IsRecordingOrReplaying() after the thread check r- mainly because this all needs documentation. Developers modifying this code after these patches have landed need to understand what the PR_*Record* code is doing there.
Attachment #8790827 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #261) > > nsWrapperCache() : mWrapper(nullptr), mFlags(0) > > { > >+ PR_RecordReplayRegisterWeakPointer(this); > > } > > ~nsWrapperCache() > > { > >+ PR_RecordReplayUnregisterWeakPointer(this); > > What do these new PR_ methods do? I don't see they definition in the full > patch. > This code can be a bit hot in some microbenchmarks, so I'd like to see what > the methods actually do. There is documentation for these (and other) PR_ methods in the full patch in nsprpub/pr/include/prrecordreplay.h. These two are inline functions, which test the gPRIsRecordingOrReplaying bool and do nothing else if it is false. If it is true, they call into the machinery in nsprpub/pr/src/misc/prrecordreplay.c, which calls a function pointer to RRInterface_InternalRecordReplay{Register,Unregister}WeakPointer in toolkit/recordreplay/ProcessRecordReplay.cpp. (This indirection is needed because the record/replay API is in NSPR, while the implementation is in XUL.)
Attachment #8790775 - Flags: review?(jcoppeard) → review+
Depends on: 1303779
Attachment #8790772 - Flags: review?(sphink) → review+
Depends on: 1303785
Comment on attachment 8790754 [details] [diff] [review] Part 4f - Don't record JS mutexes. Review of attachment 8790754 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for taking so long to get to these reviews, I was at Strange Loop, but I'm going through them now. ::: js/src/threading/posix/Mutex.cpp @@ +31,5 @@ > if (!platformData_) > oom.crash("js::Mutex::Mutex"); > > + // There are no JS mutexes which need to have their usage recorded/replayed. > + mozilla::recordreplay::AutoPassThroughThreadEvents pt; Does this type skip recording of thread events while instantiated? If so, then I think you want to put this in the guard rather than the mutex, because we have mutexes instantiated during the whole JSRuntime's lifetime.
Attachment #8790754 - Flags: review?(nfitzgerald)
Attachment #8790765 - Flags: review?(nfitzgerald) → review+
Attachment #8790800 - Flags: review?(nfitzgerald) → review+
Attachment #8790862 - Flags: review?(nfitzgerald) → review+
Attached patch patch (obsolete) — Splinter Review
Updated rolled up patch without the changes to NSPR or NSS (these are in dependent bugs). This makes two main changes: - The main public API has been moved to mfbt/RecordReplay.{h,cpp}. It is basically the same as the old NSPR API except for function naming / namespace stuff. - I redid the instrumentation of PL and PLD hash tables. Instead of adding record/replay specific logic on every hashtable operation, the tables are now instrumented by generating custom callbacks which produce consistent hash numbers between recording and replay. This is trickier than the earlier implementation but only requires checking for record/replay when constructing or destroying a table. I'll attach new versions of parts that have changed substantially.
Attachment #8790129 - Attachment is obsolete: true
The core record/replay/rewind code and public API. The public API is now in mfbt (there is also a much smaller NSPR API in bug 1303779).
Attachment #8790724 - Attachment is obsolete: true
Attachment #8792583 - Flags: review?(wmccloskey)
The record/replay initialization calls are in bug 1303779, so now this just has teardown code.
Attachment #8790730 - Attachment is obsolete: true
Attachment #8790730 - Flags: review?(wmccloskey)
Attachment #8792585 - Flags: review?(wmccloskey)
The NSPR atomics interface changes are in bug 1303779, so now this just has changes to the mfbt Atomic<> class.
Attachment #8790736 - Attachment is obsolete: true
Attachment #8790736 - Flags: review?(jwalden+bmo)
Attachment #8792586 - Flags: review?(jwalden+bmo)
Comment on attachment 8790764 [details] [diff] [review] Part 4i - Don't record NSPR thread/lock-management atomics. This part has been moved entirely into bug 1303779.
Attachment #8790764 - Attachment is obsolete: true
Comment on attachment 8790839 [details] [diff] [review] Part 8e - Don't incorporate environment into random number seed when recording or replaying. This part has moved into bug 1303785.
Attachment #8790839 - Attachment is obsolete: true
The PL hashtable instrumentation has moved into bug 1303779, and the PLD hashtable instrumentation has been rewritten to have much less overhead.
Attachment #8790840 - Attachment is obsolete: true
Attachment #8792592 - Flags: review?(nfroyd)
(In reply to Brian Hackett (:bhackett) from comment #252) > (In reply to James Long (:jlongster) from comment #242) > > @bhackett There was a frontend devtools patch that I r-'ed above. This bug > > already is huge, and I think it would be best if you landed this feature > > without any frontend changes. After all of the implementation lands, we can > > land the UI to expose it in a separate bug. That way this bug is more > > focused, and it'll be easier to help you land the frontend changes without > > the discussion about backend details. > > > > I say this also because we're developing the debugger outside of m-c now, so > > it'll be easier to coordinate those changes in a more focused bug. We also > > need to work with out designer to implement the right UX. > > Hi, my main concern is that without some changes to the debugger client, a > lot of the features in this bug can't be tested at all. The automated test > in part 17 needs new functionality from the client to detect when the replay > has finished, and future automated tests that exercise rewinding etc. will > need to use the debugger. I guess I don't see what's so bad about landing > some changes to the existing deprecated client when it is the only in-tree > way to access most of what this bug is doing. All user visible changes to > the UI which this bug makes are gated on the devtools.recordreplay.enabled > pref, which is, yeah, off by default (and for now only available at all on > nightly mac builds). It's not terrible, but if there are a bunch of tests being written against the old debugger it's going to make it that much harder to implement it in the new debugger. It would be great if we could just implement it in the new debugger now. The new debugger is already on by default in nightly (not riding trains yet, but it will soon). Many of the frontend changes can land regardless of which debugger is used: the menu items, the debugger client update (I'm assuming there are new protocol methods), etc. At that point all we need to do on our side it add buttons to call the protocol methods, correct?
Depends on: 1303836
Comment on attachment 8790895 [details] [diff] [review] Part 15a - Client side devtools changes. The debugger/frontend changes have moved into bug 1303836.
Attachment #8790895 - Attachment is obsolete: true
Comment on attachment 8790745 [details] [diff] [review] Part 4a - Make recording optional in mozilla::RefCounted. Review of attachment 8790745 [details] [diff] [review]: ----------------------------------------------------------------- Note that we have at least two other semantic copies of this: * <http://searchfox.org/mozilla-central/source/security/sandbox/chromium/base/memory/ref_counted.h#25> * <http://searchfox.org/mozilla-central/source/media/gmp-clearkey/0.1/RefCounted.h#24> Please double check if you need to handle those as well. The latter especially since it uses std::atomic on Windows for some reason.
Attachment #8790745 - Flags: review?(ehsan) → review+
Depends on: 1303891
Comment on attachment 8790828 [details] [diff] [review] Part 8a - Manually ensure hash table iteration ordering consistency in mMutants std::set. I moved this to bug 1303891 so that it can land separately.
Attachment #8790828 - Attachment is obsolete: true
(In reply to Jeff Muizelaar [:jrmuizel] from comment #235) > ::: gfx/2d/ScaledFontMac.h > @@ +76,5 @@ > > + > > + NativeFontResourceMac(uint8_t* aData, size_t aSize) > > + : mSize(aSize) > > + { > > + mData = (uint8_t*) malloc(aSize); > > It doesn't look mData ever gets freed. The destructor for NativeFontResourceMac frees mData.
Attachment #8790884 - Flags: review- → review?(jmuizelaar)
Depends on: 1303901
Hi Brian, Can you explain more why you took this approach instead of what WebKit does? I read your design document (which is really great), but I didn't understand that section. I'm still catching up on a lot of reviews, but I should have time for this this week.
Flags: needinfo?(bhackett1024)
(In reply to Bill McCloskey (:billm) from comment #276) > Hi Brian, > Can you explain more why you took this approach instead of what WebKit does? > I read your design document (which is really great), but I didn't understand > that section. Hi Bill, I updated the "Comparison with other projects" section of the design document with some more information on this. The system library APIs are much better defined and more stable than anything we have internally in Gecko or, I assume, WebKit. Any place where internal browser APIs are instrumented to act as a recording boundary will impose a maintenance burden. The only point we do that in this project is for the DrawTarget API, and I'm wondering now if that is the best strategy and whether to go back to an approach to graphics rendering like comment 77. (Also, FWIW, way back at the start of this project I wanted to pursue an approach similar to WebKit's, but roc talked me out of it.)
Flags: needinfo?(bhackett1024)
Comment on attachment 8790738 [details] [diff] [review] Part 2b - Don't record activity in atomics unit tests. Review of attachment 8790738 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/tests/TestAtomics.cpp @@ +10,5 @@ > #include <stdint.h> > > using mozilla::Atomic; > using mozilla::MemoryOrdering; > +using mozilla::AtomicRecording; Alphabetical.
Attachment #8790738 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8792586 [details] [diff] [review] Part 2a - Atomics interface changes. Review of attachment 8792586 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/Atomics.h @@ +184,5 @@ > +namespace detail { > + > +template <AtomicRecording Recording> struct AutoRecordAtomicAccess; > + > +template <> <> and such go immediately adjacent to template, throughout. @@ +193,5 @@ > + > +template <> > +struct AutoRecordAtomicAccess<AtomicRecording::PreserveOrdering> { > + AutoRecordAtomicAccess() { recordreplay::BeginOrderedAtomicAccess(); } > + ~AutoRecordAtomicAccess() { recordreplay::EndOrderedAtomicAccess(); } So the theory is a global load of a bool will be super-well-predicted or something, and then there's ~zero cost to this at all times in the future, unless recording (in which case there's an indirect function pointer call, an NSPR lock, and untold other gunk)? I'm as leery as the next person of this being true, but I guess at worst there's only shipping record/replay in dev edition or something. :-\ @@ +653,5 @@ > } > > private: > + template<MemoryOrdering AnyOrder, AtomicRecording AnyRecording> > + AtomicBase(const AtomicBase<T, AnyOrder, AnyRecording>& aCopy) = delete; While you're here, make this AtomicBase(const AtomicBase& aCopy) = delete; Templated copy constructors actually *aren't* formally copy constructors. And while they do participate in overloading and such, I'm not 100% confident this inhibits creation of the implicit copy constructor. Better to remove all the templating to be 100% confident. @@ +675,5 @@ > T operator--() { return Base::Intrinsics::dec(Base::mValue) - 1; } > > private: > + template<MemoryOrdering AnyOrder, AtomicRecording AnyRecording> > + AtomicBaseIncDec(const AtomicBaseIncDec<T, AnyOrder, AnyRecording>& aCopy) = delete; Make this non-templated, too. @@ +751,5 @@ > return Base::Intrinsics::and_(Base::mValue, aVal) & aVal; > } > > private: > + Atomic(Atomic<T, Order, Recording>& aOther) = delete; Make this an untemplated copy constructor, too. @@ +788,1 @@ > Atomic(Atomic<T*, Order>& aOther) = delete; And this while you're in the area. @@ +807,5 @@ > > using Base::operator=; > > private: > + Atomic(Atomic<T, Order, Recording>& aOther) = delete; And this. @@ +858,5 @@ > return Base::compareExchange(aOldValue, aNewValue); > } > > private: > + Atomic(Atomic<bool, Order, Recording>& aOther) = delete; And this. And if there were any others in this file needing changes, please change them. Feel free to make all these template-tweaks in a distinct rev if you want, but do make them.
Attachment #8792586 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8792592 [details] [diff] [review] Part 8f - Ensure that PLD hashtables have consistent iteration order when recording/replaying. Review of attachment 8792592 [details] [diff] [review]: ----------------------------------------------------------------- I like this approach much better. r=me ::: xpcom/glue/PLDHashTable.h @@ +282,1 @@ > // destructor, which the move assignment operator does. This comment doesn't seem entirely correct; a table with an empty mEntryStore, as we're doing below, won't execute any code that depends on mOps or mEntrySize. Ah, I see, we depend on some of these fields in the move assignment operator. Can you rewrite the comment to clarify that?
Attachment #8792592 - Flags: review?(nfroyd) → review+
Blocks: 1304146
Blocks: 1304147
Blocks: 1304149
Comment on attachment 8792583 [details] [diff] [review] Part 1a - Record/replay/rewind infrastructure. Review of attachment 8792583 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/RecordReplay.cpp @@ +124,5 @@ > + return rv; > +} > + > +// Workaround limits in C++ on casting between function pointers and data pointers. > +template <typename T> void CastFromVoid(void* aFrom, T* aTo) { *aTo = *reinterpret_cast<T*>(&aFrom); } Use mozilla::BitwiseCast in Casting.h instead of this. ::: mfbt/RecordReplay.h @@ +241,5 @@ > +/////////////////////////////////////////////////////////////////////////////// > +// API inline function implementation > +/////////////////////////////////////////////////////////////////////////////// > + > +#define MakeRecordReplayWrapperVoid(aName, aFormals, aActuals) \ These macros should be MOZ_-prefixed/named, please.
(In reply to Georg Fritzsche [:gfritzsche] from comment #254) > @@ +1762,5 @@ > > MOZ_ASSERT(!gInitDone, "TelemetryHistogram::InitializeGlobalState " > > "may only be called once"); > > > > + gCanRecordBase = canRecordBase && !PR_IsRecordingOrReplaying(); > > + gCanRecordExtended = canRecordExtended && !PR_IsRecordingOrReplaying(); > > What problems specifically is this solving? > (1) That we shouldn't collect Telemetry from the record/replay content > processes? > Or (2) does it solve specific implementation issues with the patch here? It seems better to not collect Telemetry measurements while recording/replaying (since record/replay turns off a number of features like incremental GC), but the main issue is that Telemetry measurements can happen during GC and at other points which execute non-deterministically. If collecting the telemetry performs actions like taking locks or interacting with the system at all, we won't be able to replay those actions reliably.
Updated patch per review comments.
Attachment #8790818 - Attachment is obsolete: true
Attachment #8793044 - Flags: review?(gfritzsche)
Comment on attachment 8790726 [details] [diff] [review] Part 1c - Redirections. Review of attachment 8790726 [details] [diff] [review]: ----------------------------------------------------------------- First pass: just ProcessRedirect.*, plus one or two other things I happened to notice while trying to figure out how all the pieces work. I didn't review the mechanics of crafting the redirections too closely, or double-checking on all the opcodes being used, etc. Block comments in ProcessRedirect.h (and ProcessRedirect.cpp) that talk about what we're doing, why we're doing it, and maybe even some of the mechanics for the different architectures, would be great. Jan's comments all seem appropriate. Leaving the review open to remind me that I have to look at the OS X and Windows redirections. ::: toolkit/recordreplay/ProcessRedirect.cpp @@ +29,5 @@ > + > +static uint8_t* > +PageStart(uint8_t* aPtr) > +{ > + return (uint8_t*)((size_t)aPtr & ~(PageSize - 1)); Nit: uintptr_t instead of size_t. I would cheer if we used reinterpret_cast and static_cast everywhere appropriate, but given such low-level code, enforcing that restriction here might lead to quite some verbosity. WDYT? @@ +47,5 @@ > + DWORD oldProtect; > + if (!VirtualProtect(pageStart, pageEnd - pageStart, PAGE_EXECUTE_READWRITE, &oldProtect)) > + MOZ_CRASH(); > +#else > + #error "Unknown platform" I'm assuming there are appropriate moz.build files or configury magic that ensures we're not compiling this on unsupported platforms? @@ +132,5 @@ > + uint8_t* mStart; > + uint8_t* mTarget; > + bool mShort; > +}; > +static Vector<JumpPatch> gJumpPatches; The static global vectors throughout are going to add static constructors to libxul on some platforms (notably Android), which we try to avoid. Could we gather them up into some sort of global data structure that gets allocated prior to the patching process, and then deleted afterwards? @@ +144,5 @@ > + patch.mStart = aStart; > + patch.mTarget = aTarget; > + patch.mShort = aShort; > + if (!gJumpPatches.append(patch)) > + MOZ_CRASH(); These helper functions (here and elsewhere) would be shorter and possibly clearer if they were just: if (!gJumpPatches.emplaceBack(aStart, aTarget, aShort)) { MOZ_CRASH(); } though I guess that requires writing constructors for them...the generated code might be better, too? @@ +168,5 @@ > + > +#if defined(XP_MACOSX) > +#define HAS_CPU_X64 > +#elif defined(WIN32) > +#define HAS_CPU_X86 How hard is it to support 64-bit Windows in all of this? I don't think we'd want to land this without 64-bit Windows support; we're trying to get to the point where we really recommend 64-bit Firefox to developers, and it's not going to be very nice to have a cool debugging tool that's not supported there. @@ +184,5 @@ > + // movq $aTarget, rax > + *(aIp++) = 0x40 | (1 << 3); > + *(aIp++) = 0xB8; > + *(void**)aIp = aTarget; > + aIp += 8; Nit: this looks like dead code. @@ +213,5 @@ > + > + *(aIp++) = 0x66; > + *(aIp++) = 0x68; > + *(uint16_t*)aIp = ntarget >> 48; > + aIp += 2; All of this is essentially doing: pushq $aTarget ret right (assuming we actually had a 64-bit pushq)? Would be good to document, since we've got all these magic opcodes floating about. Might be worth a comment saying that we break this up into 16-bit immediates, as pushing 32-bit immediates would be sign-extended in 64-bit mode and pushed as 64-bit quantities. @@ +369,5 @@ > + return -1; > +} > + > +static bool > +ByteMatch(uint8_t aByte, const char* aStr, ...) This function is...wow. I think you could get much more idiomatic code if you wrote it with variable-argument template functions, something like: template<typename Head, typename... Tail> bool ByteMatch(uint8_t aByte, Head aValue, Tail aMoreValues...) { if (aByte == aValue) { return true; } return ByteMatch(aByte, aMoreValues...); } bool ByteMatch(uint8_t aByte) { // Nothing to match. return false; } Admittedly gnarly, but at least this way, we're not parsing strings at runtime... @@ +402,5 @@ > + size_t nbytes = 0; > + uint8_t nconsumed = 0; > + > + // Watch for prefaces to branch instructions that give a hint to the branch > + // predictor. Note that this preface might appear before other instructions Nit: I think the usual terminology here is "prefixes" and "prefix". @@ +707,5 @@ > + > + // Don't copy call and jump instructions. We should have special cased these. > + ud_mnemonic_code_t mnemonic = ud_insn_mnemonic(&ud); > + if (mnemonic == UD_Icall || (mnemonic >= UD_Ijo && mnemonic <= UD_Ijmp)) > + return UnknownInstruction(aName, aIp); Isn't it more appropriate to assert here, rather than returning something unknown, given the comment above? @@ +921,5 @@ > + MakeMemoryAccessible(functionStart, JumpBytesClobberRax); > + } > + > + uint8_t* cursor, *end; > + AllocateRedirectionBuffer(&cursor, &end); This leaks the redirection buffer, yes? I see a couple other things that look like deliberate leaks that we're going to have to handle appropriate for our ASan builds, etc. ::: toolkit/recordreplay/ProcessRedirect.h @@ +32,5 @@ > + // DLL containing this function. > + const char* mDllName; > +#endif > + > + // Function for which calls should be redirected to targetFunction. Nit: mTargetFunction. ::: toolkit/recordreplay/ProcessRedirectDarwin.cpp @@ +392,5 @@ > + > + RecordReplayFunction(recvmsg, ssize_t, (aSockFd, aMsg, aFlags)); > + > + for (size_t i = 0; i < 2 + initialLengths[1]; i++) > + events.CheckInput(initialLengths[i]); What happens here when PR_AreThreadEventsPassedThrough() is true, and we don't initialize initialLengths? @@ +472,5 @@ > + > +static ssize_t > +RR_mprotect(void* aAddress, size_t aSize, int aFlags) > +{ > + // Ignore calls to mprotect while replaying. This function interferes with Are there security implications from this? Being able to potentially scribble on JIT'd code makes me nervous. ::: toolkit/recordreplay/ProcessRedirectWindows.cpp @@ +4192,5 @@ > + FOR_EACH_KERNEL32_REDIRECTION(MAKE_KERNEL32_REDIRECTION_ENTRY) > + FOR_EACH_SHELL32_REDIRECTION(MAKE_SHELL32_REDIRECTION_ENTRY) > + FOR_EACH_USER32_REDIRECTION(MAKE_USER32_REDIRECTION_ENTRY) > + FOR_EACH_DLL_REDIRECTION(MAKE_DLL_REDIRECTION_ENTRY) > +}; Nit: it looks like the Windows version is missing a sentinel entry.
Updated patch per review comments.
Attachment #8790827 - Attachment is obsolete: true
Attachment #8793108 - Flags: review?(bugs)
(In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #263) > ::: js/src/threading/posix/Mutex.cpp > @@ +31,5 @@ > > if (!platformData_) > > oom.crash("js::Mutex::Mutex"); > > > > + // There are no JS mutexes which need to have their usage recorded/replayed. > > + mozilla::recordreplay::AutoPassThroughThreadEvents pt; > > Does this type skip recording of thread events while instantiated? If so, > then I think you want to put this in the guard rather than the mutex, > because we have mutexes instantiated during the whole JSRuntime's lifetime. AutoPassThroughThreadEvents causes thread events to be passed through to the underlying system without being recorded or replayed. If events are being passed through when a mutex is created, though, then no locking events on that mutex will be recorded/replayed. i.e. for a lock event to be recorded both a) the mutex had to have been created when events were not passed through, and b) the lock event itself has to happen when events are not passed through. I can beef up the comment here if you want.
Attachment #8790754 - Flags: review?(nfitzgerald)
(In reply to Brian Hackett (:bhackett) from comment #286) > (In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #263) > > ::: js/src/threading/posix/Mutex.cpp > > @@ +31,5 @@ > > > if (!platformData_) > > > oom.crash("js::Mutex::Mutex"); > > > > > > + // There are no JS mutexes which need to have their usage recorded/replayed. > > > + mozilla::recordreplay::AutoPassThroughThreadEvents pt; > > > > Does this type skip recording of thread events while instantiated? If so, > > then I think you want to put this in the guard rather than the mutex, > > because we have mutexes instantiated during the whole JSRuntime's lifetime. > > AutoPassThroughThreadEvents causes thread events to be passed through to the > underlying system without being recorded or replayed. If events are being > passed through when a mutex is created, though, then no locking events on > that mutex will be recorded/replayed. i.e. for a lock event to be recorded > both a) the mutex had to have been created when events were not passed > through, and b) the lock event itself has to happen when events are not > passed through. I can beef up the comment here if you want. I totally misread this patch, and thought that the AutoPassThroughThreadEvents was being added as a member to js::Mutex.
Comment on attachment 8790754 [details] [diff] [review] Part 4f - Don't record JS mutexes. Review of attachment 8790754 [details] [diff] [review]: ----------------------------------------------------------------- LGTM
Attachment #8790754 - Flags: review?(nfitzgerald) → review+
(In reply to Nathan Froyd [:froydnj] from comment #284) > ::: toolkit/recordreplay/ProcessRedirect.cpp > @@ +29,5 @@ > > + > > +static uint8_t* > > +PageStart(uint8_t* aPtr) > > +{ > > + return (uint8_t*)((size_t)aPtr & ~(PageSize - 1)); > > Nit: uintptr_t instead of size_t. > > I would cheer if we used reinterpret_cast and static_cast everywhere > appropriate, but given such low-level code, enforcing that restriction here > might lead to quite some verbosity. WDYT? Hmm, I'll try to use C++ style casts, but I'd like the code to be clear and sometimes that's hard with idiomatic low-level C++ (see also: BitwiseCast). > @@ +47,5 @@ > > + DWORD oldProtect; > > + if (!VirtualProtect(pageStart, pageEnd - pageStart, PAGE_EXECUTE_READWRITE, &oldProtect)) > > + MOZ_CRASH(); > > +#else > > + #error "Unknown platform" > > I'm assuming there are appropriate moz.build files or configury magic that > ensures we're not compiling this on unsupported platforms? So far I've only compiled on Mac and Windows, and still need to fix up some of these #ifdefs so that this compiles everywhere (right now I'm not planning on tryserver'ing things until all the reviews are in). > @@ +132,5 @@ > > + uint8_t* mStart; > > + uint8_t* mTarget; > > + bool mShort; > > +}; > > +static Vector<JumpPatch> gJumpPatches; > > The static global vectors throughout are going to add static constructors to > libxul on some platforms (notably Android), which we try to avoid. Could we > gather them up into some sort of global data structure that gets allocated > prior to the patching process, and then deleted afterwards? Sure, I'll also check the other new files for static variables with constructors. > @@ +168,5 @@ > > + > > +#if defined(XP_MACOSX) > > +#define HAS_CPU_X64 > > +#elif defined(WIN32) > > +#define HAS_CPU_X86 > > How hard is it to support 64-bit Windows in all of this? I don't think we'd > want to land this without 64-bit Windows support; we're trying to get to the > point where we really recommend 64-bit Firefox to developers, and it's not > going to be very nice to have a cool debugging tool that's not supported > there. Well, the windows port isn't ready yet (I'll be starting work on that again soon) but once it is there shouldn't be much difficulty in getting 64-bit windows to work. Since all the system APIs are the same (I hope), just the low level ABI and codegen stuff should need fixing, which isn't too extensive. > @@ +369,5 @@ > > + return -1; > > +} > > + > > +static bool > > +ByteMatch(uint8_t aByte, const char* aStr, ...) > > This function is...wow. I think you could get much more idiomatic code if > you wrote it with variable-argument template functions, something like: > > template<typename Head, typename... Tail> > bool > ByteMatch(uint8_t aByte, Head aValue, Tail aMoreValues...) > { > if (aByte == aValue) { > return true; > } > return ByteMatch(aByte, aMoreValues...); > } > > bool > ByteMatch(uint8_t aByte) > { > // Nothing to match. > return false; > } > > Admittedly gnarly, but at least this way, we're not parsing strings at > runtime... Err, yeah, this function dates from back when all of this was C code. I'll rewrite it. > @@ +707,5 @@ > > + > > + // Don't copy call and jump instructions. We should have special cased these. > > + ud_mnemonic_code_t mnemonic = ud_insn_mnemonic(&ud); > > + if (mnemonic == UD_Icall || (mnemonic >= UD_Ijo && mnemonic <= UD_Ijmp)) > > + return UnknownInstruction(aName, aIp); > > Isn't it more appropriate to assert here, rather than returning something > unknown, given the comment above? Well, the special casing for call, jump, and ip relative instructions is incomplete and there could be other instructions that slip through. Calling UnknownInstruction() here will cause us to remember the instruction we didn't understand, and Redirect() will end up returning false and we'll fail at the end of the redirection process (right now we just crash). > ::: toolkit/recordreplay/ProcessRedirectDarwin.cpp > @@ +392,5 @@ > > + > > + RecordReplayFunction(recvmsg, ssize_t, (aSockFd, aMsg, aFlags)); > > + > > + for (size_t i = 0; i < 2 + initialLengths[1]; i++) > > + events.CheckInput(initialLengths[i]); > > What happens here when PR_AreThreadEventsPassedThrough() is true, and we > don't initialize initialLengths? When events are being passed through then the RecordReplayFunction macro boils down to a call to the underlying function that was redirected, and then an immediate return of that call's rval. > @@ +472,5 @@ > > + > > +static ssize_t > > +RR_mprotect(void* aAddress, size_t aSize, int aFlags) > > +{ > > + // Ignore calls to mprotect while replaying. This function interferes with > > Are there security implications from this? Being able to potentially > scribble on JIT'd code makes me nervous. I doubt I'm the best person to make a judgement call here, but I don't think the security concerns when recording or replaying are as extensive as for normal content processes, since the user has to have had started recording or replaying a page deliberately (also, until recently we always had RWX jitcode anyhow). On a related note, though, I've wondered about whether there are security concerns in potentially allowing extensions to record pages on the sly, which I assume they can do since they are chrome privileged. I don't know if this would give them any more power than they already have, though, or whether we take (or could take) defenses against malicious extensions.
Comment on attachment 8793044 [details] [diff] [review] Part 5n - Don't perform telemetry while recording or replaying. Review of attachment 8793044 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, this looks good to me.
Attachment #8793044 - Flags: review?(gfritzsche) → review+
Comment on attachment 8790884 [details] [diff] [review] Part 14d - Support creating native font resources for Mac draw targets. Review of attachment 8790884 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/ScaledFontMac.h @@ +76,5 @@ > + > + NativeFontResourceMac(uint8_t* aData, size_t aSize) > + : mSize(aSize) > + { > + mData = (uint8_t*) malloc(aSize); Can you make mData a UniquePtr<uint8_t[]> and use MakeUnique<uint8_t[]>(aSize) instead?
Attachment #8790884 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8790898 [details] [diff] [review] Part 16 - Server side devtools changes. Review of attachment 8790898 [details] [diff] [review]: ----------------------------------------------------------------- ::: devtools/server/actors/object.js @@ +115,5 @@ > let raw = this.obj.unsafeDereference(); > > // If Cu is not defined, we are running on a worker thread, where xrays > // don't exist. > + if (raw && Cu) { This needs a comment at the unsafeDereference call explaining that, in replay, it can return null or undefined or whatever it returns. It seems like there are a lot of other calls to unsafeDereference in this file, some of which refer to the raw object outside of a `try` block. Are they not covered by the tests? @@ +1487,5 @@ > return true; > } > > let raw = obj.unsafeDereference(); > + if (raw) { Similarly. @@ +1826,5 @@ > let items = grip.preview.items = []; > > let i = 0; > for (let key of keys) { > + if (rawObj && rawObj.hasOwnProperty(key) && i++ < OBJECT_PREVIEW_MAX_ITEMS) { It seems like you could put the `if (rawObj) around the entire `for` loop. ::: devtools/server/actors/script.js @@ +471,5 @@ > return this._dbg; > }, > > get globalDebugObject() { > + if (!this._parent.window || this.dbg.replaying) { How does this affect uses of globalDebugObject like the one here? http://searchfox.org/mozilla-central/rev/c31ad35f39c6187b2e121aa6d3a39b7f67397010/devtools/server/actors/script.js#1132 @@ +934,5 @@ > if (stepFrame) { > switch (steppingType) { > case "step": > + if (rewinding) { > + this.dbg.onPopFrame = onEnterFrame; I sort of get it, but it would be nice to have a comment about how rewinding is being handled here.
Attachment #8790898 - Flags: review?(jimb) → review+
Comment on attachment 8793108 [details] [diff] [review] Part 7 - Ensure deterministic interaction of GC with CC and object references. > nsWrapperCache::CheckCCWrapperTraversal(void* aScriptObjectHolder, > nsScriptObjectTracer* aTracer) > { >- JSObject* wrapper = GetWrapper(); >+ JSObject* wrapper = GetWrapperPreserveColorForGC(); This is not GC related code. So, I don't think you need any change here. > class nsWrapperCache > { > public: > NS_DECLARE_STATIC_IID_ACCESSOR(NS_WRAPPERCACHE_IID) > > nsWrapperCache() : mWrapper(nullptr), mFlags(0) > { >+ mozilla::recordreplay::RegisterWeakPointer(this); Ok, the method is inline and doesn't do much if IsRecordingOrReplaying() returns false. Some microbenchmarking will be needed anyhow. >+ /** >+ * Get the cached wrapper. >+ * >+ * As for GetWrapperPreserveColor, this getter does not change the color of >+ * the JSObject. Additionally, when recording or replaying it might not >+ * behave consistently between the two executions. Because this does not >+ * interact with the recording, it is the only getter which should be used >+ * during a GC. >+ */ I don't understand the setup here, especially I don't understand how you guarantee this is used during GC, but not elsewhere. Since same code paths, which then end up calling this or the non-ForGC case, might be used in some helper methods. FWIW, my guess is that this all will be broken immediately it lands, since it is hard to understand when one is supposed to do what. We need more documentation on this stuff, and _tests_ static void sandbox_finalize(js::FreeOp* fop, JSObject* obj) { nsIScriptObjectPrincipal* sop = static_cast<nsIScriptObjectPrincipal*>(xpc_GetJSPrivate(obj)); if (!sop) { // sop can be null if CreateSandboxObject fails in the middle. return; } static_cast<SandboxPrivate*>(sop)->ForgetGlobalObject(); - NS_RELEASE(sop); + if (recordreplay::IsRecordingOrReplaying()) { + // Trigger a later call to RecordReplayReleaseSandbox. + recordreplay::ActivateTrigger(sop); + } else { + // Release the principal reference immediately. + NS_RELEASE(sop); + } Oh, this stuff looks buggy to me, I mean even without web replay. Could you file a bug to make sandbox_finalize to release async. CC :gabor and :bholley to that bug >+// When recording or replaying we can't directly release references on the >+// sandbox principal during GC finalization (since finalization happens at >+// non-deterministic points and destroying the principal can interact with the >+// recording). Use the record/replay trigger mechanism to release the principal >+// reference at a consistent point, similar to what is done with deferred >+// finalizers. So, I think we need to fix sandbox finalization in general, and that will hopefully simplify this patch. void CycleCollectedJSRuntime::JSObjectsTenured() { MOZ_ASSERT(mJSContext); for (auto iter = mNurseryObjects.Iter(); !iter.Done(); iter.Next()) { nsWrapperCache* cache = iter.Get(); - JSObject* wrapper = cache->GetWrapperPreserveColor(); - MOZ_ASSERT(wrapper); + JSObject* wrapper = cache->GetWrapperPreserveColorForGC(); + MOZ_ASSERT_IF(!recordreplay::IsReplaying(), wrapper); So is recordreplay::IsReplaying() threadsafe? So, I think better to fix sandbox handling in a separate bug and that would simplify this stuff. Other than that, rs+, I guess, but since this patch shouldn't land, marking still r-
Attachment #8793108 - Flags: review?(bugs) → review-
Depends on: 1306280
Depends on: 1306281
(In reply to Olli Pettay [:smaug] (reviewing overload) from comment #293) > >+ /** > >+ * Get the cached wrapper. > >+ * > >+ * As for GetWrapperPreserveColor, this getter does not change the color of > >+ * the JSObject. Additionally, when recording or replaying it might not > >+ * behave consistently between the two executions. Because this does not > >+ * interact with the recording, it is the only getter which should be used > >+ * during a GC. > >+ */ > I don't understand the setup here, especially I don't understand how you > guarantee this is used during GC, but not elsewhere. > Since same code paths, which then end up calling this or the non-ForGC case, > might be used in some helper methods. > > > FWIW, my guess is that this all will be broken immediately it lands, since > it is hard to understand when one is supposed to do what. > We need more documentation on this stuff, and _tests_ Hi, in the latest patch (which I would ask for review for, but bugzilla isn't letting me) I've removed GetWrapperPreserveColorForGC, so that the nsWrapperCache interface is unchanged by the patch. (There is an existing record/replay API to detect if we are in a GC or another region of execution where thread events are not allowed, which allows GetWrapperJSObject to behave correctly no matter where it is called.) What sort of tests are you looking for? There are no unit tests for the cpp code that I can find, and the wrapper cache and deferred finalization code will be tested as part of any record/replay integration test since it is such central code to the DOM. > void > CycleCollectedJSRuntime::JSObjectsTenured() > { > MOZ_ASSERT(mJSContext); > > for (auto iter = mNurseryObjects.Iter(); !iter.Done(); iter.Next()) { > nsWrapperCache* cache = iter.Get(); > - JSObject* wrapper = cache->GetWrapperPreserveColor(); > - MOZ_ASSERT(wrapper); > + JSObject* wrapper = cache->GetWrapperPreserveColorForGC(); > + MOZ_ASSERT_IF(!recordreplay::IsReplaying(), wrapper); > > So is recordreplay::IsReplaying() threadsafe? Yes. The record/replay booleans are set during process startup when it is still single threaded, and are unset late during shutdown when the process is again single threaded.
I mean tests checking that this all keeps working, like, can we run record-replay as part of some mochitests? Some tests which cover pretty much all the code the patches in this bug is touching. But sounds like you have some tests? This all is very exiting and great when it works well, but the maintenance is non-trivial, so plenty of tests are needed to catch regressions as early as possible.
(In reply to Olli Pettay [:smaug] (reviewing overload) from comment #296) > I mean tests checking that this all keeps working, like, can we run > record-replay as part of some mochitests? Some tests which cover pretty much > all the code the patches in this bug is touching. > But sounds like you have some tests? Yes, there is currently one mochitest in part 17 which records and replays a web page, and I'll add more tests for the remaining functionality (record/replay more complex pages, rewinding, debugger integration) before landing. > This all is very exiting and great when it works well, but the maintenance > is non-trivial, so plenty of tests are needed to catch regressions as early > as possible. Yes, definitely. Also, once this lands we should be able to start fuzz testing recording/replaying in the same fashion as the record/replay mochitests.
(In reply to Brian Hackett (:bhackett) from comment #297) > (In reply to Olli Pettay [:smaug] (reviewing overload) from comment #296) > > I mean tests checking that this all keeps working, like, can we run > > record-replay as part of some mochitests? Some tests which cover pretty much > > all the code the patches in this bug is touching. > > But sounds like you have some tests? > > Yes, there is currently one mochitest in part 17 which records and replays a > web page, and I'll add more tests for the remaining functionality > (record/replay more complex pages, rewinding, debugger integration) before > landing. How resilient is the record/replay functionality in the face of calls to system functions not currently covered by the hooking functionality, but required to correctly record/replay? It seems pretty plausible that we can add new platform features that don't get exercised by any record/replay tests we have, but cause record/replay to diverge.
(In reply to Nathan Froyd [:froydnj] from comment #298) > (In reply to Brian Hackett (:bhackett) from comment #297) > > (In reply to Olli Pettay [:smaug] (reviewing overload) from comment #296) > > > I mean tests checking that this all keeps working, like, can we run > > > record-replay as part of some mochitests? Some tests which cover pretty much > > > all the code the patches in this bug is touching. > > > But sounds like you have some tests? > > > > Yes, there is currently one mochitest in part 17 which records and replays a > > web page, and I'll add more tests for the remaining functionality > > (record/replay more complex pages, rewinding, debugger integration) before > > landing. > > How resilient is the record/replay functionality in the face of calls to > system functions not currently covered by the hooking functionality, but > required to correctly record/replay? It seems pretty plausible that we can > add new platform features that don't get exercised by any record/replay > tests we have, but cause record/replay to diverge. Our handling for this is pretty good on Windows: we can hook every entry point in a given DLL and insert a check that the function is only ever called when thread events are being passed through. With this in place, if a function in the DLL which isn't hooked is directly called while recording (by new code in m-c, or by existing code which isn't covered by other record/replay tests) then we immediately print the name of the function to stderr and crash. On Macs our handling is worse. During both recording and replay the actual API will be called and it will do its normal thing instead of interacting with the recording. IME unhandled library APIs will usually crash during replay, since many of these APIs are passed pointers to other library structures, and these pointers are bogus if they were generated by another library API (the hook for that library API will just replay the raw pointer value that was produced while recording). Tracking these unhandled APIs down isn't hard, but it's a specialized skill and it would be much much better if we did something similar to what we do on Windows. In any case, after landing and at least until record/replay can be considered stable (per fuzz and user testing) it's probably best for record/replay tests to be hidden on tinderbox.
Brian, I just added another exception handler in bug 1305360. Assuming it sticks, it should be easy to disable: just add a check to MemoryProtectionExceptionHandler::isDisabled(). You might also be able to use the code as a basis for any exception handlers you decide to add after the initial landing of Web Replay: the Mach exception handler I added takes care of forwarding messages to other exception handlers in the chain, which is the most complicated part of Mach exception handling and not something we had in the tree before. The code is currently specific to one type of exception and only handles one message, but it should be simple to generalize from there.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #300) > Brian, I just added another exception handler in bug 1305360. Assuming it > sticks, it should be easy to disable: just add a check to > MemoryProtectionExceptionHandler::isDisabled(). > > You might also be able to use the code as a basis for any exception handlers > you decide to add after the initial landing of Web Replay: the Mach > exception handler I added takes care of forwarding messages to other > exception handlers in the chain, which is the most complicated part of Mach > exception handling and not something we had in the tree before. The code is > currently specific to one type of exception and only handles one message, > but it should be simple to generalize from there. Thanks! I tried and failed to get the exception handler in this bug to forward messages to other handlers (i.e. for lldb) so I will definitely work on incorporating what you've done in bug 1305360.
Comment on attachment 8792583 [details] [diff] [review] Part 1a - Record/replay/rewind infrastructure. Review of attachment 8792583 [details] [diff] [review]: ----------------------------------------------------------------- Hi Brian, I tried to understand this as best I could but it feels almost a little hopeless. It's really hard to tell who calls what, or why they call it. There's a little bit of documentation, but there needs to be a lot more. It's also really bad that everything is grouped into two huge files. I'm stuck looking at a long list of individual methods without knowing how they fit into the whole. In some cases I think the callers are in different patches, but I don't know which ones. I've kind of given up on offering substantive comments on how to improve the code. As long as it doesn't break the rest of the browser, I guess it's probably fine. But I'd like it to be possible for someone to understand the code if they need to. I think the following would help: - Break it up into different files. There's nothing wrong with having a file with only three or four methods in it. - For each file, describe what code it's supporting (who calls it), what constraints it operates under (can it use normal allocation, locking functions), what thread it runs on, and whether it's used during recording, replay, or all execution. - Add assertions to individual functions about what threads they operate on. Comment who calls them if it's not in the same file. As a concrete example, when I first started reading the patch, I assumed that there was NSPR instrumentation that called stuff like NewLock. That doesn't seem to be the case. Eventually I realized it was in the redirections, but that took me a couple minutes. By the time I found it, I had forgotten why I even started looking. That's why comments are so important. I'm just going to post what I have so far, but I've really only gotten through the obvious parts of the patch. ::: mfbt/RecordReplay.cpp @@ +109,5 @@ > +#undef DECLARE_SYMBOL > +#undef DECLARE_SYMBOL_VOID > + > +static void* > +LoadSymbol(const char* aName) I haven't followed the linkage issues in this bug. Nathan should really review this file. Is the idea that the code lives in libxul, and JS engine callers go through these wrappers? What happens with the global variables? ::: mfbt/RecordReplay.h @@ +3,5 @@ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#ifndef mozilla_RecordReplay_h___ No underscores at the end. @@ +39,5 @@ > + > +// Note: This can be called directly (i.e. AutoOrderedAtomicAccess()) to insert > +// an ordering fence that will force threads to execute in the same order > +// during replay. > +struct AutoOrderedAtomicAccess Would be nice to use the GuardObjects.h stuff here. @@ +55,5 @@ > + > +// Whether events in this thread are passed through. > +static inline bool AreThreadEventsPassedThrough(); > + > +struct AutoPassThroughThreadEvents Same here. @@ +63,5 @@ > +}; > + > +// As for AutoPassThroughThreadEvents, but may be used when events are already > +// passed through. > +struct AutoEnsurePassThroughThreadEvents And here. @@ +82,5 @@ > +private: > + bool passedThrough; > +}; > + > +// Mark a region where thread events cannot occur at all. What is a "thread event"? @@ +108,5 @@ > +// lock or an associated cvar --- this is taken care of automatically. > +// > +// Other threads, which include any thread that is replay specific and certain > +// others (like JS helper threads, which only block on unrecorded resources) > +// Remove this blank line. @@ +117,5 @@ > +// whenever the main thread forces all other threads to become idle, and at > +// most once after the call to NotifyUnrecordedWait if the main thread is > +// already waiting for other threads to become idle. The callback should > +// poke the thread so that it is no longer blocked on the resource, and will > +// call MaybeWaitForSnapshot before blocking again. Who calls MaybeWaitForSnapshot? The thread that is being woken up, or the callback? I guess the thread being woken up? @@ +118,5 @@ > +// most once after the call to NotifyUnrecordedWait if the main thread is > +// already waiting for other threads to become idle. The callback should > +// poke the thread so that it is no longer blocked on the resource, and will > +// call MaybeWaitForSnapshot before blocking again. > +MFBT_API void NotifyUnrecordedWait(void (*aCallback)(void*), void* aArgument); A C++ lambda here would be a lot cleaner. @@ +160,5 @@ > +// API for managing accesses to weak pointers which might be cleared by the JS > +// garbage collector (whose behavior may vary between recording and replay). > +// When recording or replaying, WeakPointerAccess should be called each read of > +// the weak pointer (successful or not). When replaying NextWeakPointerSucceeds > +// and SetWeakPointerJSRoot can be used to make sure that the weak pointer is It's not clear at all what the last two functions do or when they should be called. It would help to give some example weak pointer code and how these would be used. @@ +180,5 @@ > +// ActivateTrigger. ActivateTrigger may be called while recording in code that > +// runs non-deterministically. The |fn| callback associated with the trigger > +// will be called at some deterministic point in execution (under a call to > +// ExecuteTriggers) after the point in the recording where the trigger was > +// activated. I don't understand this paragraph at all. Again, an example would help. @@ +206,5 @@ > +// event which requires interacting with the system will trigger an unhandled > +// divergence from the recording and cause the debugger operation to fail. > +static inline bool HasDivergedFromRecording(); > + > +// API for use by the replay debugger. Needs more documentation. ::: toolkit/recordreplay/ProcessRecordReplay.cpp @@ +83,5 @@ > +void > +File::EnsureBallast(size_t aSize) > +{ > + if (mBallastSize < aSize) { > + mBallast = ReallocateMemory(mBallast, mBallastSize, aSize); You never make the ballast smaller. Could you just allocate it to be 64K up front? It seems like the first read is probably going to be 64K anyway if I understand correctly. @@ +121,5 @@ > + size_t compressedSize = mNextCompressedSize; > + size_t decompressedSize = mNextDecompressedSize; > + size_t nread = compressedSize + sizeof(PRUint16) * 2; > + > + EnsureBallast(nread); You're using realloc here even though the ballast doesn't hold anything you care about. Seems like a wasted memcpy. @@ +124,5 @@ > + > + EnsureBallast(nread); > + size_t res = DirectRead(mFd, mBallast, nread); > + if (res == nread) { > + mNextCompressedSize = *(uint16_t*)(mBallast + compressedSize); Seems better to have an intermediate var point to reinterpret_cast<uint16_t*>(mBallast + compressedSize) and then read from [0] and [1]. @@ +130,5 @@ > + } else if (res == compressedSize) { > + mNextCompressedSize = 0; > + mNextDecompressedSize = 0; > + } else { > + MOZ_CRASH(); Why crash here? This seems like a valid thing to have happen. You could be interrupted by a signal, for example. Maybe DirectRead handles that? @@ +134,5 @@ > + MOZ_CRASH(); > + } > + mSeekOffset += res; > + > + if (mBufferSize < decompressedSize) { Can you assert here that mBufferPos == mBufferSize? @@ +135,5 @@ > + } > + mSeekOffset += res; > + > + if (mBufferSize < decompressedSize) { > + mBuffer = ReallocateMemory(mBuffer, mBufferSize, decompressedSize); Why Reallocate? Seems like you just want to free the old one and allocate a new one. No need to copy since mBuffer seems to be unused at this point. @@ +140,5 @@ > + mBufferSize = decompressedSize; > + } > + > + size_t bytesWritten; > + if (!Compression::LZ4::decompress((const char*) mBallast, compressedSize, Could you just make mBallast and mBuffer be chars? It seems like it would eliminate some casts. Are there other places where you need them to be unsigned? @@ +165,5 @@ > + > + size_t bound = Compression::LZ4::maxCompressedSize(mBufferPos); > + MOZ_ASSERT(bound < (1 << 16)); > + > + EnsureBallast(bound + sizeof(uint16_t) * 2); Again, seems like you don't need the memcpy. @@ +174,5 @@ > + MOZ_CRASH(); > + MOZ_ASSERT((size_t)compressedSize <= bound); > + > + *(uint16_t*)mBallast = compressedSize; > + *(uint16_t*)(mBallast + sizeof(uint16_t)) = mBufferPos; Again, an intermediate var here would clean this up. @@ +200,5 @@ > + aData = (uint8_t*)aData + bufWrite; > + aSize -= bufWrite; > + > + // Grow the file's buffer if it is not at its maximum size. > + if (mBufferSize < BUFFER_MAX) { Again, all this growth logic seems unnecessary to me. Won't you quickly grow the buffer to 64K and then it will stay that big? I have to assume that you write a lot of data. @@ +278,5 @@ > + uint8_t bits = aValue & 127; > + aValue = aValue >> 7; > + if (aValue) > + bits |= 128; > + WriteBytes(&bits, 1); You should document this encoding in a comment. @@ +308,5 @@ > + AutoPassThroughThreadEvents pt; > + fprintf(stderr, "!!!!!!!!!!!! Input Mismatch !!!!!!!!!!!!\nRecorded: %d\nReplayed: %d\n", > + (int) oldValue, (int) aValue); > + } > + LastDitchRestoreSnapshot(); What does this do? @@ +335,5 @@ > + return gTlsThreadKey.get(); > +} > + > +void > +Thread::BindToCurrent() What does this do? @@ +370,5 @@ > +}; > +static ThreadChunk* gThreads; > + > +/* static */ Thread* > +Thread::GetById(size_t aId) What threads do these methods run on? You need assertions or something so it's obvious what state is touched by what threads. @@ +390,5 @@ > + Thread* thread; > + if (gThreads) { > + // Don't take the global lock when allocating new threads while replaying, > + // since all threads are allocated up front and ReplayUnlock assumes > + // that all such allocation has been performed. I'm still confused why we don't take the lock here. Does PR_Unlock cause ReplayUnlock to be called? @@ +398,5 @@ > + size_t nid = aId; > + while (nid >= THREAD_CHUNK_SIZE) { > + nid -= THREAD_CHUNK_SIZE; > + if (!chunk->next) > + chunk->next = new ThreadChunk(); So the only difference with GetById here is this line. Could you call that function and pass in an aCreate param or something? @@ +451,5 @@ > +{ > + return gNumThreads; > +} > + > +static size_t gNumRecordedThreads = 0; It sounds like this is only used during replay? Please comment. @@ +555,5 @@ > + > +LockAcquiresChunk* gLockAcquires; > + > +static inline LockAcquires* > +GetLockAcquires(size_t aId) It seems like you're using this same chunk pattern multiple times. Can you put it into a template? @@ +2509,5 @@ > + > +PR_IMPLEMENT(void) > +RecordReplayInterface_InternalRecordReplayAssert(const char* aText) > +{ > +#ifdef DEBUG This seems really expensive. Maybe it should be behind something other than DEBUG? @@ +2686,5 @@ > +static ThingHash gLocks = { sizeof(Lock*) }; > +static ReadWriteLock gLocksLock; > + > +void > +NewLock(void* aNativeLock) Who calls this? ::: toolkit/recordreplay/ProcessRecordReplay.h @@ +34,5 @@ > + ExecuteTriggersFinished, > + CallStart // Redirected call events start here. > +}; > + > +enum class AllocatedMemoryKind What's this for? @@ +58,5 @@ > +// Internal layout in the files has the following form: > +// > +// compressed16 #1 > +// decompressed16 #1 > +// ... compressed16 #1 bytes ... What does this mean? @@ +66,5 @@ > +// ... > + > +typedef size_t FileHandle; > + > +class File It seems like this file stuff should go in a separate C++ file. @@ +170,5 @@ > + > +// Information about the execution state of each thread. > +class Thread > +{ > +public: Why is everything in this class public? @@ +350,5 @@ > +private: > + SpinLock& mLock; > +}; > + > +struct AutoReadWriteLock Seems like some of this locking stuff could also go in its own file. @@ +415,5 @@ > +void ReplayDeallocateMemory(void* aAddress, size_t aSize, bool aMemoryLockHeld); > +void* ReplayTryAllocateMemory(void* aAddress, size_t aSize); > + > +// These APIs allow system calls to be performed directly, bypassing > +// any hooks. Who implements this stuff? ::: toolkit/recordreplay/ProcessRewind.cpp @@ +40,5 @@ > +static void ThreadNotify(size_t aId); > +static void ThreadWait(bool aAllowSave, bool aHasGlobalLock); > + > +/* static */ void > +Thread::ThreadMain(void* aArgument) Who starts this thread?
(In reply to Bill McCloskey (:billm) from comment #302) > Hi Brian, > I tried to understand this as best I could but it feels almost a little > hopeless. It's really hard to tell who calls what, or why they call it. > There's a little bit of documentation, but there needs to be a lot more. > It's also really bad that everything is grouped into two huge files. I'm > stuck looking at a long list of individual methods without knowing how they > fit into the whole. In some cases I think the callers are in different > patches, but I don't know which ones. > > I've kind of given up on offering substantive comments on how to improve the > code. As long as it doesn't break the rest of the browser, I guess it's > probably fine. But I'd like it to be possible for someone to understand the > code if they need to. I think the following would help: > > - Break it up into different files. There's nothing wrong with having a file > with only three or four methods in it. > - For each file, describe what code it's supporting (who calls it), what > constraints it operates under (can it use normal allocation, locking > functions), what thread it runs on, and whether it's used during recording, > replay, or all execution. > - Add assertions to individual functions about what threads they operate on. > Comment who calls them if it's not in the same file. OK, I'll make these changes and will post a new patch. I'll do the same for parts 9a and 10a, which are similarly miserly about using multiple files. I agree that people should be able to understand what this code is doing without expending a lot of effort. Thanks for looking at the patch! > ::: mfbt/RecordReplay.cpp > @@ +109,5 @@ > > +#undef DECLARE_SYMBOL > > +#undef DECLARE_SYMBOL_VOID > > + > > +static void* > > +LoadSymbol(const char* aName) > > I haven't followed the linkage issues in this bug. Nathan should really > review this file. Is the idea that the code lives in libxul, and JS engine > callers go through these wrappers? What happens with the global variables? Yes. The global variables are in libmozglue, which means that two dereferences are needed to check them. For now this is simplest, but if there is any measurable overhead associated with checking the globals then I want to try putting separate copies in each library so that only one dereference is needed. > @@ +130,5 @@ > > + } else if (res == compressedSize) { > > + mNextCompressedSize = 0; > > + mNextDecompressedSize = 0; > > + } else { > > + MOZ_CRASH(); > > Why crash here? This seems like a valid thing to have happen. You could be > interrupted by a signal, for example. Maybe DirectRead handles that? Yes, DirectRead blocks until the read actually completes, handling any EINTR errors internally.
Attachment #8792583 - Flags: review?(wmccloskey)
Comment on attachment 8790843 [details] [diff] [review] Part 9a - PReplay protocol and parent side implementation. Canceling per comment 303.
Attachment #8790843 - Flags: review?(wmccloskey)
Attachment #8790843 - Flags: review?(nical.bugzilla)
Attachment #8790843 - Flags: review?(bas)
Attachment #8790854 - Flags: review?(wmccloskey)
Attachment #8790854 - Flags: review?(nical.bugzilla)
Attachment #8790854 - Flags: review?(bas)
Comment on attachment 8790726 [details] [diff] [review] Part 1c - Redirections. Review of attachment 8790726 [details] [diff] [review]: ----------------------------------------------------------------- Apologies for the delay; I've looked at this several times over the past week, but each time I come to the conclusion that reviewing the record/replay wrappers themselves is not terribly useful. The code works, after all, and it's not like you can say, "Oh, you should take this wrapper out"--it's obviously there for a reason. I'd echo Bill's comments on other patches: ProcessRedirect.h could use a lot more commentary about the pieces it defines and the macros provided therein, maybe with a bit of guidance on "here's what to do if you have to wrap a previously unwrapped function." I don't have much substantive to offer here except for some thoughts on how the machinery could be made friendlier. ::: toolkit/recordreplay/ProcessRedirect.h @@ +139,5 @@ > + return rrf.mRval; \ > + rrf.StartRecordReplay(); \ > + File& events = rrf.mThread->mEvents; \ > + (void) events; \ > + aReturnType& rval = rrf.mRval I wonder if it's just as easy to get aReturnType from decltype(aName) here. Similarly for argument types, if we have to pass them in @@ +142,5 @@ > + (void) events; \ > + aReturnType& rval = rrf.mRval > + > +#define RecordReplayFunctionABI(aName, aReturnType, aABI, aActuals) \ > + RecordReplayFunctionFormals(aName, aReturnType, aABI, (...), aActuals) varargs here actually works? Ugh. @@ +144,5 @@ > + > +#define RecordReplayFunctionABI(aName, aReturnType, aABI, aActuals) \ > + RecordReplayFunctionFormals(aName, aReturnType, aABI, (...), aActuals) > + > +#define RecordReplayFunction(aName, aReturnType, aActuals) \ It was a little confusing reading code and realizing that this had to be a macro, because various locals were springing to life without being declared. I'm curious whether all of these macros could somehow be rewritten using better C++ features (lambdas?). @@ +166,5 @@ > +#define RecordReplayFunctionVoid(aName, aActuals) \ > + RecordReplayFunctionVoidABI(aName, DEFAULTABI, aActuals) > + > +// Note: Do not use any of the macros below when the redirected function takes > +// arguments that are not scalar values. Use RecordReplayFunction directly. What are "scalar values" in this context? Integers only? @@ +169,5 @@ > +// Note: Do not use any of the macros below when the redirected function takes > +// arguments that are not scalar values. Use RecordReplayFunction directly. > + > +// The following macros are used for functions that return a scalar value and > +// do not record an error anywhere (i.e. with errno or SetLastError). Can we static_assert that the function returns a scalar value (e.g. via decltype) and that it has scalar parameters, at least? @@ +176,5 @@ > + static size_t DEFAULTABI \ > + RR_ ##aName () \ > + { \ > + RecordReplayFunction(aName, size_t, ()); \ > + events.ReadOrWriteValue(&rval); \ I wonder if RecordOrReplayValue might be a better name for this. (And similarly for what's currently called ReadOrWriteBytes.) ::: toolkit/recordreplay/ProcessRedirectDarwin.cpp @@ +907,5 @@ > +// pthreads redirections > +/////////////////////////////////////////////////////////////////////////////// > + > +static ssize_t > +RR_pthread_cond_init(pthread_cond_t* aCond, const pthread_condattr_t* aAttr) Shouldn't this return the same size thing as pthread_cond_init, and similarly for all the other functions below? I guess it probably doesn't matter thanks to the ABI details.
Attachment #8790726 - Flags: review?(nfroyd) → feedback+
Comment on attachment 8790847 [details] [diff] [review] Part 9b - Handle separate PCompositorChild used in middleman processes. Review of attachment 8790847 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/ipc/TabChild.cpp @@ +2779,5 @@ > sTabChildren->Put(aLayersId, this); > mLayersId = aLayersId; > } > > + if (!PR_IsMiddleman()) Please brace the if blocks.
Attachment #8790847 - Flags: review?(nical.bugzilla) → review+
Depends on: 1309552
Attached patch patchSplinter Review
Updated rolled up patch. This is larger than the last patch but the record/replay code is structured much better now and has about twice as many comments.
Attachment #8792581 - Attachment is obsolete: true
This patch has the public record/replay API and its linkage strangeness.
Attachment #8792583 - Attachment is obsolete: true
Attachment #8800305 - Flags: review?(wmccloskey)
Attachment #8800305 - Flags: review?(nfroyd)
This patch has some simple utility classes which do not directly interact with the record/replay infrastructure but which the infrastructure needs.
Attachment #8790726 - Attachment is obsolete: true
Attachment #8800306 - Flags: review?(wmccloskey)
This patch has the core infrastructure needed to record/replay an execution.
Attachment #8800311 - Flags: review?(wmccloskey)
This patch has almost all the infrastructure needed for rewinding an execution (there is also some in Thread.cpp in part 1f).
Attachment #8800312 - Flags: review?(wmccloskey)
This patch has the infrastructure needed for redirecting functions. The redirection header is better documented, the redirection macros have changed somewhat (it turns out that using varargs on windows doesn't work: msvc treats __stdcall varargs function pointers as __cdecl), and the actual redirection code is cleaner now.
Attachment #8800313 - Flags: review?(nfroyd)
The actual platform specific function redirections. This hasn't changed much from the last version of this code, other than some new windows redirections, and doesn't need an in depth review I think.
Attachment #8800318 - Flags: review?(nfroyd)
This is a little cleaner now that RegisterTrigger takes an std::function.
Attachment #8796191 - Attachment is obsolete: true
Attachment #8800320 - Flags: review?(bugs)
This patch has the middleman process side of IPC with the replaying process, as before, though the compositor/graphics logic is now abstracted into separate files/patches.
Attachment #8800322 - Flags: review?(wmccloskey)
The middleman process side of compositor IPDL marshaling.
Attachment #8790843 - Attachment is obsolete: true
Attachment #8800327 - Flags: review?(nical.bugzilla)
DrawEvent translation and DrawTarget rendering performed in the middleman process.
Attachment #8800328 - Flags: review?(bas)
The replaying process side of IPC with the middleman process. As with part 9a, this is now separated from the compositor and graphics logic for the replaying process.
Attachment #8790854 - Attachment is obsolete: true
Attachment #8800331 - Flags: review?(wmccloskey)
This is a little cleaner now that NotifyUnrecordedWait takes an std::function.
Attachment #8790857 - Attachment is obsolete: true
Attachment #8790857 - Flags: review?(wmccloskey)
Attachment #8800333 - Flags: review?(wmccloskey)
This is also cleaner with the changes to NotifyUnrecordedWait.
Attachment #8790859 - Attachment is obsolete: true
Attachment #8790859 - Flags: review?(wmccloskey)
Attachment #8800335 - Flags: review?(wmccloskey)
This patch has the replaying process side of compositor state management.
Attachment #8800338 - Flags: review?(nical.bugzilla)
This patch has the child side of graphics state management, for keeping track of draw targets and sending draw events to the middleman process.
Attachment #8800339 - Flags: review?(bas)
This is similar to the last version of this patch, but has better documentation.
Attachment #8790888 - Attachment is obsolete: true
Attachment #8790888 - Flags: review?(bas)
Attachment #8800342 - Flags: review?(bas)
Comment on attachment 8800320 [details] [diff] [review] Part 7 - Ensure deterministic interaction of GC with CC and object references. So, how does this all work in worker threads? Is recording disabled there or are workers disabled when recording?
Attachment #8800320 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #325) > Comment on attachment 8800320 [details] [diff] [review] > Part 7 - Ensure deterministic interaction of GC with CC and object > references. > > So, how does this all work in worker threads? Is recording disabled there or > are workers disabled when recording? For now, DOM workers are disabled when recording or replaying.
Blocks: 1310271
Comment on attachment 8800305 [details] [diff] [review] Part 1a - Public record/replay API. Review of attachment 8800305 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/RecordReplay.h @@ +79,5 @@ > +// Whether the current process is a middleman between a replaying process and > +// chrome process. > +static inline bool IsMiddleman() { return gIsMiddleman; } > + > +// Mark a region which occurs atomically wrt the recording. No two threads can Do these functions enforce the fact that the region is atomic, or merely note it for replay? @@ +192,5 @@ > +static inline void RemoveHashTableItem(void* aTable, void* aThing); > +static inline void RemoveAllHashTableItems(void* aTable); > +static inline void MoveHashTableItems(void* aFirstTable, void* aSecondTable); > +static inline void SortHashTableItems(void* aTable, void** aArray, size_t aCount); > +MFBT_API void GetOrderedHashTableItems(void* aTable, void** aArray, size_t aCount); What's the difference between Sort and GetOrdered? @@ +202,5 @@ > +// not exposed here.) > +static inline const PLDHashTableOps* GeneratePLDHashTableCallbacks(const PLDHashTableOps* aOps); > +static inline const PLDHashTableOps* UnwrapPLDHashTableCallbacks(const PLDHashTableOps* aOps); > +static inline void DestroyPLDHashTableCallbacks(const PLDHashTableOps* aOps); > +static inline void MovePLDHashTableContents(const PLDHashTableOps* aFirstOps, What does this do? @@ +290,5 @@ > +// void DestroyContents(); > +// }; > +MFBT_API void RegisterTrigger(void* aObj, std::function<void()> aCallback); > +MFBT_API void UnregisterTrigger(void* aObj); > +MFBT_API void ActivateTrigger(void* aObj); Could we call this EnableTrigger instead? Activate makes it sound like it will run right away. @@ +322,5 @@ > +// > +// The callback passed to NotifyUnrecordedWait will be invoked at most once > +// by the main thread whenever the main thread is waiting for other threads to > +// become idle, and at most once after the call to NotifyUnrecordedWait if the > +// main thread is already waiting for other threads to become idle. I'm still confused when the callback is called. How do you know when you should no longer call the callback? What if the callback references state that will be deleted at some point in the future? An example or something would really help here. @@ +368,5 @@ > +// FinishReplaying has been called. > +MFBT_API bool LastSnapshotIsFinal(); > + > +// Return whether the last snapshot encountered is an interim one. If the > +// middleman asked us to rewind to snapshot N, but we did not record N but What does it mean for a snapshot not to be recorded? Why would you take a snapshot without recording it? How could the middleman reference such a snapshot?
Attachment #8800305 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #327) > Comment on attachment 8800305 [details] [diff] [review] > Part 1a - Public record/replay API. > > Review of attachment 8800305 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mfbt/RecordReplay.h > @@ +79,5 @@ > > +// Whether the current process is a middleman between a replaying process and > > +// chrome process. > > +static inline bool IsMiddleman() { return gIsMiddleman; } > > + > > +// Mark a region which occurs atomically wrt the recording. No two threads can > > Do these functions enforce the fact that the region is atomic, or merely > note it for replay? These functions enforce that the region is atomic, but will only do so when recording or replaying. Otherwise they are no-ops. > @@ +192,5 @@ > > +static inline void RemoveHashTableItem(void* aTable, void* aThing); > > +static inline void RemoveAllHashTableItems(void* aTable); > > +static inline void MoveHashTableItems(void* aFirstTable, void* aSecondTable); > > +static inline void SortHashTableItems(void* aTable, void** aArray, size_t aCount); > > +MFBT_API void GetOrderedHashTableItems(void* aTable, void** aArray, size_t aCount); > > What's the difference between Sort and GetOrdered? SortHashTableItems sorts an existing array, whereas GetOrderedHashTableItems allocates an array whose contents are sorted. I'll update the comment. > @@ +202,5 @@ > > +// not exposed here.) > > +static inline const PLDHashTableOps* GeneratePLDHashTableCallbacks(const PLDHashTableOps* aOps); > > +static inline const PLDHashTableOps* UnwrapPLDHashTableCallbacks(const PLDHashTableOps* aOps); > > +static inline void DestroyPLDHashTableCallbacks(const PLDHashTableOps* aOps); > > +static inline void MovePLDHashTableContents(const PLDHashTableOps* aFirstOps, > > What does this do? We ensure that PLDHashTables behave consistently during iteration by generating custom ops for each table that produce the same hash numbers between recording and replay. Each of these custom ops wraps the original ops for the table, and these functions are used for managing the custom ops. I'll update the comment to describe in more detail what these APIs are doing. > @@ +322,5 @@ > > +// > > +// The callback passed to NotifyUnrecordedWait will be invoked at most once > > +// by the main thread whenever the main thread is waiting for other threads to > > +// become idle, and at most once after the call to NotifyUnrecordedWait if the > > +// main thread is already waiting for other threads to become idle. > > I'm still confused when the callback is called. How do you know when you > should no longer call the callback? What if the callback references state > that will be deleted at some point in the future? An example or something > would really help here. Once NotifyUnrecordedWait has been called, the callback can be invoked any number of times, at any point in the future. This API is used by threads that are servicing some sort of event loop (be it posted runnables, incoming/outgoing data on a pipe, JS helper thread tasks, ...) and that will never terminate when replaying. > @@ +368,5 @@ > > +// FinishReplaying has been called. > > +MFBT_API bool LastSnapshotIsFinal(); > > + > > +// Return whether the last snapshot encountered is an interim one. If the > > +// middleman asked us to rewind to snapshot N, but we did not record N but > > What does it mean for a snapshot not to be recorded? Why would you take a > snapshot without recording it? How could the middleman reference such a > snapshot? Hmm, the terminology here is pretty bad. Right now there are two related concepts: Snapshot: a point in execution which has been assigned an identifier. Recorded snapshot: A snapshot where we have captured enough information that we can restore the process' state at this point sometime in the future. I think it would be better to rename 'snapshot' to 'checkpoint', and 'recorded snapshot' to just plain 'snapshot'. Does this make more sense? (The existing terminology dates back to the time when all checkpoints/snapshots were recorded so there wasn't any need to distinguish the two.) Anyways, using this new terminology, whenever the replaying process reaches a checkpoint it notifies the middleman, so that the middleman knows what happens between each checkpoint (and can, say, compute how many times each JS debugger breakpoint was hit). When the JS debugger (or whichever other client exists in the middleman) wants the replaying process to rewind, it asks the replaying process to go backwards one checkpoint at a time. If the checkpoint it wants to rewind to does not have an associated snapshot, the replaying process rewinds to the previous snapshot and runs forward to the desired checkpoint from there. All the checkpoints hit before the desired checkpoint is reached are interim ones.
(In reply to Brian Hackett (:bhackett) from comment #328) > I think it would be better to rename 'snapshot' to 'checkpoint', and > 'recorded snapshot' to just plain 'snapshot'. Does this make more sense? > (The existing terminology dates back to the time when all > checkpoints/snapshots were recorded so there wasn't any need to distinguish > the two.) Yes, I like this much better.
Comment on attachment 8800306 [details] [diff] [review] Part 1c - Record/replay utilities. Review of attachment 8800306 [details] [diff] [review]: ----------------------------------------------------------------- I'm not too happy about these data structures since they mostly duplicate stuff we already have. I can understand the need for spinlocks when you don't want to call into the OS. Maybe, as I get further into the patches and see how this stuff is used, I'll feel better about it. ::: toolkit/recordreplay/ChunkAllocator.h @@ +19,5 @@ > +// cost of O(n) lookup. > +// > +// ChunkAllocator contents are never destroyed. > +template <typename T> > +class ChunkAllocator Can this be marked MOZ_ONLY_USED_TO_AVOID_STATIC_CONSTRUCTORS? @@ +25,5 @@ > + // A page sized block holding a next pointer and an array of as many things > + // as possible. > + struct Chunk > + { > + uint8_t mStorage[PageSize - sizeof(Chunk*)]; It would be nicer to use AlignedStorage for this. @@ +67,5 @@ > + > + // Create a new entry with the specified ID. This must not be called on IDs > + // that have already been used with this allocator. > + inline T* Create(size_t aId) { > + if (aId < mCapacity) { This isn't really safe. You're reading from mCapacity without a lock, and it could be on a different cache line from the chunk itself. So you could end up getting a more recent value for mCapacity than you do for the chunk data (and, crucially, the chunk next pointer). In that case you'll crash. I think this will be safe if you use a release-acquire Atomic for mCapacity. I'd feel a lot better if you just the spinlock around the whole thing though. Is it really that important to be lockless? ::: toolkit/recordreplay/InfallibleVector.h @@ +91,5 @@ > +template<typename T, > + size_t MinInlineCapacity = 0, > + class AllocPolicy = MallocAllocPolicy> > +class InfallibleVector > + : public InfallibleVectorOperations<InfallibleVector<T, MinInlineCapacity, AllocPolicy>, Instead of doing this, could you just use a normal vector with an infallible alloc policy? http://searchfox.org/mozilla-central/source/memory/mozalloc/mozalloc.h#289 @@ +105,5 @@ > + > +template<typename T, > + size_t MinInlineCapacity = 0, > + class AllocPolicy = MallocAllocPolicy> > +class StaticInfallibleVector This is kind of nice, but it seems like we'll leak the memory on shutdown. I'm not sure if that will cause problems for LSan. ::: toolkit/recordreplay/Monitor.h @@ +14,5 @@ > + > +// Simple wrapper around a PRLock and PRCondVar. This is a lighter weight > +// abstraction than mozilla::Monitor and has simpler interactions with the > +// record/replay system. > +class Monitor This seems pretty similar to mozilla::Monitor except that it lacks deadlock detection and it does this pass-through thing. It seems like deadlock detection would be deterministic. Why not use mozilla::Monitor? ::: toolkit/recordreplay/SpinLock.h @@ +25,5 @@ > +// locking APIs. These locks are used in places where reentrance into APIs > +// needs to be avoided, or where writes to heap memory are not allowed. > + > +// A basic spin lock. > +class SpinLock Can this be marked MOZ_ONLY_USED_TO_AVOID_STATIC_CONSTRUCTORS? @@ +32,5 @@ > + inline void Lock(); > + inline void Unlock(); > + > +private: > + int32_t mLocked; This should be Atomic with release-acquire semantics. @@ +37,5 @@ > +}; > + > +// A basic read/write spin lock. This lock permits either multiple readers and > +// no writers, or one writer. > +class ReadWriteSpinLock MOZ_ONLY_USED_TO_AVOID_STATIC_CONSTRUCTORS? @@ +147,5 @@ > + AutoSpinLock ex(mLock); > + done = aRead ? (mReaders != -1) : (mReaders == 0); > + if (done) { > + mReaders = aRead ? (mReaders + 1) : -1; > + } Maybe ThreadYield outside of the spin lock if you fail to take the lock?
Attachment #8800306 - Flags: review?(wmccloskey)
Comment on attachment 8792585 [details] [diff] [review] Part 1d - Teardown of record/replay state Review of attachment 8792585 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/xre/nsEmbedFunctions.cpp @@ +852,5 @@ > XRE_ShutdownChildProcess() > { > + // If this is a replaying process, idle indefinitely in case the middleman > + // wants us to rewind later on. > + if (recordreplay::IsReplaying()) { This might make more sense in ContentChild. XRE_ShutdownChildProcess is used by different kinds of child processes, and you really only care about ContentChild.
Attachment #8792585 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #330) > ::: toolkit/recordreplay/InfallibleVector.h > @@ +91,5 @@ > > +template<typename T, > > + size_t MinInlineCapacity = 0, > > + class AllocPolicy = MallocAllocPolicy> > > +class InfallibleVector > > + : public InfallibleVectorOperations<InfallibleVector<T, MinInlineCapacity, AllocPolicy>, > > Instead of doing this, could you just use a normal vector with an infallible > alloc policy? > http://searchfox.org/mozilla-central/source/memory/mozalloc/mozalloc.h#289 Unfortunately, all the fallible methods on Vector are MOZ_MUST_USE, regardless of the alloc policy. The purpose of this class is to avoid writing (ugly, error prone) code like this all over the place to satisfy the static analysis: if (!foo.append(bar)) { MOZ_CRASH(); } Creating a new InfallibleVector class isn't ideal (though it's nice that it shares code with StaticInfallibleVector), but AFAICT we don't have another way to get an infallible growing array with an arbitrary alloc policy. > ::: toolkit/recordreplay/Monitor.h > @@ +14,5 @@ > > + > > +// Simple wrapper around a PRLock and PRCondVar. This is a lighter weight > > +// abstraction than mozilla::Monitor and has simpler interactions with the > > +// record/replay system. > > +class Monitor > > This seems pretty similar to mozilla::Monitor except that it lacks deadlock > detection and it does this pass-through thing. It seems like deadlock > detection would be deterministic. Why not use mozilla::Monitor? Hmm, deadlock detection is deterministic, but the main problem here is that this recordreplay::Monitor class is used to create monitors while replaying that did not exist while recording. IIRC the deadlock detector lazily creates some internal state, and if we create an unrecorded mozilla::Monitor near process startup while replaying, then that state is created at a different point than while recording and the replay gets out of sync. This could be fixed, but it seemed simpler and less fragile to just add a separate monitor class that doesn't have this baggage.
(In reply to Brian Hackett (:bhackett) from comment #332) > Unfortunately, all the fallible methods on Vector are MOZ_MUST_USE, > regardless of the alloc policy. The purpose of this class is to avoid > writing (ugly, error prone) code like this all over the place to satisfy the > static analysis: OK, I guess that's fine. > > This seems pretty similar to mozilla::Monitor except that it lacks deadlock > > detection and it does this pass-through thing. It seems like deadlock > > detection would be deterministic. Why not use mozilla::Monitor? > > Hmm, deadlock detection is deterministic, but the main problem here is that > this recordreplay::Monitor class is used to create monitors while replaying > that did not exist while recording. IIRC the deadlock detector lazily > creates some internal state, and if we create an unrecorded mozilla::Monitor > near process startup while replaying, then that state is created at a > different point than while recording and the replay gets out of sync. This > could be fixed, but it seemed simpler and less fragile to just add a > separate monitor class that doesn't have this baggage. OK. It looks like BlockingResourceBase uses PR_CallOnce, so I can imagine how that could screw things up. Can you maybe comment to that effect so people know why this exists?
> Unfortunately, all the fallible methods on Vector are MOZ_MUST_USE, > regardless of the alloc policy. The purpose of this class is to avoid > writing (ugly, error prone) code like this all over the place to satisfy the > static analysis: > > if (!foo.append(bar)) { > MOZ_CRASH(); > } > > Creating a new InfallibleVector class isn't ideal (though it's nice that it > shares code with StaticInfallibleVector), but AFAICT we don't have another > way to get an infallible growing array with an arbitrary alloc policy. DMD has the same problem -- it uses an infallible allocator. The solution used for it is to sprinkle MOZ_ALWAYS_TRUE around liberally, like this: MOZ_ALWAYS_TRUE(mSet.add(p, newString)); It's not ideal, but it's better than an if+MOZ_CRASH.
Attachment #8790734 - Flags: review?(wmccloskey) → review+
I believe (void) foo.append(bar); works, and is an easier idiom to recognize.
(In reply to Steve Fink [:sfink] [:s:] from comment #335) > I believe > > (void) foo.append(bar); > > works, and is an easier idiom to recognize. It will satisfy clang but will not satisfy gcc. You could also use |mozilla::Unused << foo.append(bar);|.
Comment on attachment 8800305 [details] [diff] [review] Part 1a - Public record/replay API. Review of attachment 8800305 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for updating the documentation. r=me; I trust you on the documentation bits, but I'd like to know what's up with the |static inline| bits. ::: mfbt/RecordReplay.cpp @@ +37,5 @@ > + Macro(AllocateUntrackedMemory, void*, (size_t aSize), (aSize)) \ > + Macro(NextWeakPointerReadSucceeds, bool, (const void* aPtr), (aPtr)) \ > + Macro(InternalThingIndex, size_t, (void* aThing), (aThing)) > + > +#define FOR_EACH_INTERFACE_VOID(Macro) \ I'm not entirely sure it's worth it to have a separate macro just for void return types: judging from the uses and macros below, it seems like the void-specific ones do nothing but call the return-valued ones with |void| appropriately (or similar). But I guess since this is an internal implementation detail, it doesn't matter too much... ::: mfbt/RecordReplay.h @@ +2,5 @@ > +/* vim: set ts=8 sts=2 et sw=2 tw=80: */ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + Can you provide a one-line comment here summarizing the header so that it shows up nicely in DXR et al? @@ +65,5 @@ > +/////////////////////////////////////////////////////////////////////////////// > +// Public API > +/////////////////////////////////////////////////////////////////////////////// > + > +extern MFBT_DATA bool gIsRecordingOrReplaying; MFBT_DATA and such need to have mozilla/Types.h included above @@ +84,5 @@ > +// be in an atomic region at once, and the order in which atomic sections are > +// executed by the various threads will be the same in the replay as in the > +// recording. > +static inline void BeginOrderedAtomicAccess(); > +static inline void EndOrderedAtomicAccess(); Where are these functions (and similar |static inline| declarations) actually defined? Do they come in a later patch that put them in this header, or out-of-line? |inline| seems like not what you want for these if they are actually defined out-of-line... @@ +106,5 @@ > +// Whether events in this thread are passed through. > +static inline bool AreThreadEventsPassedThrough(); > + > +// RAII class for regions where thread events are passed through. > +struct MOZ_RAII AutoPassThroughThreadEvents We need to include mozilla/Attributes.h to use this. @@ +357,5 @@ > +/////////////////////////////////////////////////////////////////////////////// > +// Debugger API > +/////////////////////////////////////////////////////////////////////////////// > + > +// These functions are for use by the debugger in a replaying process. Can you expand this comment a bit? Reading "debugger" here made me think gdb/lldb/etc., but that's not what's in view here, and this "debugger" concept hasn't been discussed before in this file AFAICS. (The HasDivergedFromRecording comment mentions it, but only in cursory manner.) The middleman process was discussed above, but that sounded like it was solely for IPC, not any kind of debugging or process control, even though some functions below (e.g. LastSnapshotIsInterim) have comments discussing the middleman process.
Attachment #8800305 - Flags: review?(nfroyd) → review+
Comment on attachment 8800318 [details] [diff] [review] Part 1i - Platform specific redirections. Review of attachment 8800318 [details] [diff] [review]: ----------------------------------------------------------------- rs=me
Attachment #8800318 - Flags: review?(nfroyd) → review+
Comment on attachment 8800313 [details] [diff] [review] Part 1h - Redirections infrastructure. Review of attachment 8800313 [details] [diff] [review]: ----------------------------------------------------------------- This all seems much nicer than the previous version, thanks for the revisions. Apologies for the delays in the reviewing all of these. ::: toolkit/recordreplay/CallFunction.h @@ +29,5 @@ > +// > +// And so forth. > +#define DefineCallFunction(aABI, aReturnType, aFormals, aFormalTypes, aActuals) \ > + static inline aReturnType CallFunction ##aABI aFormals { \ > + return BitwiseCast<aReturnType (aABI *) aFormalTypes>(aFn) aActuals; \ I'm pretty sure you could do this all with variadic templates, but somehow I think that would be more pain than it's worth... ::: toolkit/recordreplay/ProcessRedirect.cpp @@ +678,5 @@ > + UnprotectExecutableMemory(patch.mStart, patch.mShort ? ShortJumpBytes : JumpBytesClobberRax); > + } > + for (size_t i = 0; i < gClobberPatches.length(); i++) { > + const ClobberPatch& patch = gClobberPatches[i]; > + UnprotectExecutableMemory(patch.mStart, patch.mEnd - patch.mStart); I think we want to reprotect the patched memory from these two loops...or do we require these regions to be unprotected while things are running? ::: toolkit/recordreplay/ProcessRedirect.h @@ +3,5 @@ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#ifndef mozilla_toolkit_recordreplay_ProcessRedirect_h Single-line comment prior to this for DXR et al would be great. Here and the other headers as well.
Attachment #8800313 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #337) > @@ +84,5 @@ > > +// be in an atomic region at once, and the order in which atomic sections are > > +// executed by the various threads will be the same in the replay as in the > > +// recording. > > +static inline void BeginOrderedAtomicAccess(); > > +static inline void EndOrderedAtomicAccess(); > > Where are these functions (and similar |static inline| declarations) > actually defined? Do they come in a later patch that put them in this > header, or out-of-line? |inline| seems like not what you want for these if > they are actually defined out-of-line... They are defined later in this file via some macros, under the "API inline function implementation" comment.
(In reply to Nathan Froyd [:froydnj] from comment #339) > ::: toolkit/recordreplay/ProcessRedirect.cpp > @@ +678,5 @@ > > + UnprotectExecutableMemory(patch.mStart, patch.mShort ? ShortJumpBytes : JumpBytesClobberRax); > > + } > > + for (size_t i = 0; i < gClobberPatches.length(); i++) { > > + const ClobberPatch& patch = gClobberPatches[i]; > > + UnprotectExecutableMemory(patch.mStart, patch.mEnd - patch.mStart); > > I think we want to reprotect the patched memory from these two loops...or do > we require these regions to be unprotected while things are running? I'll fix this so we reprotect the regions (I didn't do it after Jan's review comment earlier because there was an interaction with the mprotect redirection on OS X --- right now we ignore all mprotect calls while replaying --- but that redirection can be fixed).
So, here we are at comment 342! Brian, could you give us a summary of how things are going?
Flags: needinfo?(bhackett1024)
(In reply to Jim Blandy :jimb from comment #342) > So, here we are at comment 342! Brian, could you give us a summary of how > things are going? Well, most of the work I've done lately has been on the Windows port in bug 1310271, though for the last few weeks I haven't been working on this project at all. The reviews in this bug have stalled and I no longer have an idea of how this project fits into the future of the browser, and this has been rather demotivating. I'm hoping to get things cleared up next week in Hawaii and then resume working on the Windows port.
Flags: needinfo?(bhackett1024)
> I no longer have an idea of how this project fits into the future of the browser, and this has been rather demotivating. I'm sorry to hear that. At DevTools we have a pretty clear idea how this fits into the future of the browser, in that we're considering it a killer feature, albeit one at high risk due to the complexity of the implementation. Can you say more about how you'd originally seen it fitting in, and why that isn't working out?
(In reply to Jim Blandy :jimb from comment #344) > > I no longer have an idea of how this project fits into the future of the browser, and this has been rather demotivating. > > I'm sorry to hear that. At DevTools we have a pretty clear idea how this > fits into the future of the browser, in that we're considering it a killer > feature, albeit one at high risk due to the complexity of the > implementation. Can you say more about how you'd originally seen it fitting > in, and why that isn't working out? I'm mainly concerned about when and how to land this without interfering with Quantum or other ongoing work. I don't want to be overly negative/dramatic; this project is the coolest thing I've ever done, and I'm fully committed to it.
Worth to discuss next week when would be the best time to land this. My initial guess is asap. A question is how to ensure all this keeps working and how other people can modify the code which this bug is touching. The code here is after all rather black magic on first sight. I assume we will have to disable some Quantum (DOM) specific stuff when replay is enabled, but that is probably fine.
We discussed this at the work week. The basic consensus is that: 1. As long as it doesn't break anything and the code is well-commented, it should land. 2. Brian is solely responsible for keeping it working. We should have tests, but Brian is responsible for keeping the tests working. If record/reply gets in the way of some team's work, that team should do whatever is easiest to disable the code. Brian will fix it. 3. We'll do some sort of user testing on Nightly to see how valuable this functionality is to web developers. If it's as useful as we hope, then we'll have good reason to support it more fully. If developers aren't interested, then we'll remove it. I guess we didn't discuss who will own part (3). Jim, is this something the devtools team could manage? It seems like Shield studies exist for this purpose, but I don't know very much about them.
Flags: needinfo?(jimb)
Attachment #8800311 - Flags: review?(wmccloskey) → review+
Attachment #8800312 - Flags: review?(wmccloskey) → review+
Attachment #8790740 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #347) > I guess we didn't discuss who will own part (3). Jim, is this something the > devtools team could manage? It seems like Shield studies exist for this > purpose, but I don't know very much about them. I will bring this up with Devtools management.
Flags: needinfo?(jimb)
Comment on attachment 8790834 [details] [diff] [review] Part 8b - Manually record/replay mach_msg IPC calls. Review of attachment 8790834 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/chrome/common/mach_ipc_mac.mm @@ +277,5 @@ > + timeout, // timeout in ms > + MACH_PORT_NULL); > + } > + > + mozilla::recordreplay::AutoOrderedAtomicAccess(); What is this expected to do? Did you mean to include a variable name here?
Attachment #8790834 - Flags: review?(wmccloskey) → review+
Attachment #8800322 - Flags: review?(wmccloskey) → review+
Attachment #8790848 - Flags: review?(wmccloskey) → review+
Comment on attachment 8790851 [details] [diff] [review] Part 9d - Allow sync messages to be sent off the main thread by middleman or replaying processes. Review of attachment 8790851 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/glue/MessageChannel.cpp @@ +1622,5 @@ > AssertWorkerThread(); > > int prio = aMsg.priority(); > > + MOZ_RELEASE_ASSERT(prio == IPC::Message::PRIORITY_NORMAL || Can you add a comment that we only expect messages with elevated priority to be used on the main thread or during replay?
Attachment #8790851 - Flags: review?(wmccloskey) → review+
Attachment #8800331 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #349) > Comment on attachment 8790834 [details] [diff] [review] > Part 8b - Manually record/replay mach_msg IPC calls. > > Review of attachment 8790834 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: ipc/chromium/src/chrome/common/mach_ipc_mac.mm > @@ +277,5 @@ > > + timeout, // timeout in ms > > + MACH_PORT_NULL); > > + } > > + > > + mozilla::recordreplay::AutoOrderedAtomicAccess(); > > What is this expected to do? Did you mean to include a variable name here? Calling AutoOrderedAtomicAccess() without an RAII class instance can be used to record/replay the order in which certain events execute. In this case we are ensuring that during the replay we don't replay the receipt of messages before the sender has actually sent them. I'll add a comment here, but first I'll look more into just hooking mach_msg and removing all this instrumentation, which is what we really should be doing.
Attachment #8800333 - Flags: review?(wmccloskey) → review+
Attachment #8800335 - Flags: review?(wmccloskey) → review+
Comment on attachment 8790860 [details] [diff] [review] Part 10e - Don't allow snapshots to be taken when in the middle of sending replay specific IPC messages. Review of attachment 8790860 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/chromium/src/chrome/common/ipc_channel_posix.cc @@ +569,5 @@ > + // replayed) message then make sure each segment is fully sent before > + // returning control to the message loop. Otherwise we might take a > + // snapshot while only part of the message has been sent, and if we later > + // restore the snapshot we will end up sending only the tail end of the > + // message to the other side of the pipe. I guess you can't get deadlocks here because only one side of the channel can block like this. Please say so in the comment. @@ +706,5 @@ > > + // If we need to send all outgoing messages before returning control to > + // the message loop, don't return control to libevent. This might result > + // in a busy-wait. > + if (send_all_messages) { Is this part necessary? Why would we ever get here if send_all_messages is true? Shouldn't bytes_written always equal amt_to_write (unless there was an error)? Also, please note the EMSGSIZE issue above. It seems like you might need to deal with that.
Attachment #8790860 - Flags: review?(wmccloskey) → review+
Attachment #8790863 - Flags: review?(wmccloskey) → review+
Comment on attachment 8790896 [details] [diff] [review] Part 15b - Spawning record/replay/middleman processes. Review of attachment 8790896 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/ipc/ContentParent.cpp @@ +786,5 @@ > maxContentProcesses = 1; > > + if (sNonAppContentParents->Length() >= uint32_t(maxContentProcesses) && > + !recordExecution.Length() && > + !replayExecution.Length()) I think this code could be made cleaner if you rebase. There's some new code to handle different types of content processes, and hopefully you'll be able to benefit from it. ::: ipc/glue/GeckoChildProcessHost.cpp @@ +80,5 @@ > > static bool > ShouldHaveDirectoryService() > { > + return GeckoProcessType_Default == XRE_GetProcessType() || PR_IsMiddleman(); What is this needed for? @@ +526,3 @@ > { > + // Set environment variables if we want the child process to record or replay > + // its behavior. This doesn't seem like the right place for this code. Could you instead pass in a generic set of environment variables and have ContentParent decide what they should be? @@ +531,5 @@ > + nsAutoCString cmd; > + cmd.Append("RECORD="); > + cmd.Append(NS_ConvertUTF16toUTF8(aRecordExecution)); > + PR_SetEnv(cmd.get()); > + recordreplay::InvalidateReplayDirectory(getenv("RECORD")); Which patch is this function in?
Attachment #8790896 - Flags: review?(wmccloskey)
Attachment #8790842 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8800327 [details] [diff] [review] Part 9e - Parent side of compositor management. Review of attachment 8800327 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/ipc/ParentCompositor.h @@ +33,5 @@ > +// process needs to be updated to reflect its state at the point being > +// rewound to. To handle this, the parent keeps an ordering of every > +// compositor update that the replaying process has sent, and when rewinding > +// it sends updates to the chrome process to dismantle the layer tree and > +// reconstruct the state at the snapshot being rewound to. It would be nice to add a link to the wiki page that explains web replay in this comment (and other large-ish explicative comment blocks). The terminology and concepts are a bit opaque if you haven't read that page beforehand.
Attachment #8800327 - Flags: review?(nical.bugzilla) → review+
Attachment #8800338 - Flags: review?(nical.bugzilla) → review+
Attachment #8790875 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8790789 [details] [diff] [review] Part 5h - Disable Cocoa native drawing when recording or replaying. Review of attachment 8790789 [details] [diff] [review]: ----------------------------------------------------------------- This shouldn't be relevant anymore.
Attachment #8790789 - Flags: review?(bas) → review+
Comment on attachment 8790877 [details] [diff] [review] Part 14a - DrawTargetRecording refactoring to allow for common code with DrawTargetRecordReplay. Review of attachment 8790877 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/RecordedEvent.h @@ +212,5 @@ > + nsRefPtrHashtable<nsPtrHashKey<void>, SourceSurface> mSourceSurfaces; > + nsRefPtrHashtable<nsPtrHashKey<void>, FilterNode> mFilterNodes; > + nsRefPtrHashtable<nsPtrHashKey<void>, GradientStops> mGradientStops; > + nsRefPtrHashtable<nsPtrHashKey<void>, ScaledFont> mScaledFonts; > + nsRefPtrHashtable<nsUint64HashKey, NativeFontResource> mNativeFontResources; Please make a separate child implementation, that has all these overloads and is inherited by PrintTranslator and whatever you're going to use (SimpleTranslator?). It doesn't matter right now, but in the future we don't want to require people to incur the memory cost of these when they're not going to use them.
Attachment #8790877 - Flags: review?(bas) → review-
Comment on attachment 8790877 [details] [diff] [review] Part 14a - DrawTargetRecording refactoring to allow for common code with DrawTargetRecordReplay. Review of attachment 8790877 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/RecordedEvent.h @@ +212,5 @@ > + nsRefPtrHashtable<nsPtrHashKey<void>, SourceSurface> mSourceSurfaces; > + nsRefPtrHashtable<nsPtrHashKey<void>, FilterNode> mFilterNodes; > + nsRefPtrHashtable<nsPtrHashKey<void>, GradientStops> mGradientStops; > + nsRefPtrHashtable<nsPtrHashKey<void>, ScaledFont> mScaledFonts; > + nsRefPtrHashtable<nsUint64HashKey, NativeFontResource> mNativeFontResources; Please make a separate child implementation, that has all these overloads and is inherited by PrintTranslator and whatever you're going to use (SimpleTranslator?). It doesn't matter right now, but in the future we don't want to require people to incur the memory cost of these when they're not going to use them.
Comment on attachment 8790880 [details] [diff] [review] Part 14b - Fix backend getter for DrawTargetRecording. Review of attachment 8790880 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/DrawTargetRecording.h @@ -31,5 @@ > > ~DrawTargetRecording(); > > virtual DrawTargetType GetType() const override { return mFinalDT->GetType(); } > - virtual BackendType GetBackendType() const override { return mFinalDT->GetBackendType(); } This can actually cause problems, we don't want the recording DT to be visible to other users or we'll construct incorrect things inside Mozilla when using one.
Attachment #8790880 - Flags: review?(bas) → review-
Comment on attachment 8790880 [details] [diff] [review] Part 14b - Fix backend getter for DrawTargetRecording. Review of attachment 8790880 [details] [diff] [review]: ----------------------------------------------------------------- FWIW, the comment here is correct, there were some bugs with this, there still might be, this is what 'IsRecording' is for, those places should check for that as well.
Comment on attachment 8790882 [details] [diff] [review] Part 14c - Allow recording translator to manage creation of similar draw targets. Review of attachment 8790882 [details] [diff] [review]: ----------------------------------------------------------------- I think your other suggestion is better, having a Translator::CreateSimilarDT.
Attachment #8790882 - Flags: review?(bas) → review-
Attachment #8790885 - Flags: review?(bas) → review+
Comment on attachment 8790890 [details] [diff] [review] Part 14g - Create DrawTargetRecordReplay in gfx Factory when recording or replaying. Review of attachment 8790890 [details] [diff] [review]: ----------------------------------------------------------------- Code inside Moz2D cannot depend on code -compiled- in Gecko, you'll have to set this as a flag on the factory and check it from there.
Attachment #8790890 - Flags: review?(bas) → review-
Comment on attachment 8800328 [details] [diff] [review] Part 9f - Parent side of DrawTarget management. Review of attachment 8800328 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/ipc/ParentGraphics.cpp @@ +82,5 @@ > + // process specified. > + if (aShmem) { > + info.mData = GetShmem(aShmem).get<uint8_t>(); > + } else { > + info.mData = new uint8_t[aStride * aSize.height]; I can't seem to find where this is freed. @@ +164,5 @@ > + return gfx::FontType::DWRITE; > + case gfx::BackendType::CAIRO: > + return gfx::FontType::CAIRO; > + case gfx::BackendType::SKIA: > + return gfx::FontType::SKIA; Cairo and Skia can both handle DWrite fonts under circumstances. Be aware this might create inconsistencies between a 'real' session and a record/replay. @@ +363,5 @@ > + gTranslator->DecodeFromSnapshot(aInfo); > + > + // Restore the contents of all draw target data buffers by fetching them from > + // the replaying process. > + for (size_t i = 1; i < aInfo->mDrawTargetIndex; i++) { I'm a little worried this doesn't always guarantee the stored State on the DT to be correct. But it might not matter in practice.
Attachment #8800328 - Flags: review?(bas) → review+
Comment on attachment 8790892 [details] [diff] [review] Part 14h - Optimize recording size for font data events. Review of attachment 8790892 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/2d/RecordedEvent.cpp @@ +1511,5 @@ > +{ > + // Explicitly record and replay font data information. Computing this data > + // can consume a large portion of recordings, and it is more efficient to > + // record the data itself rather than the calls made when computing it. > + if (PR_IsReplaying()) { Same as before, will have to come off the factory. @@ +1523,5 @@ > + PR_RecordReplayBytes(&glyphSize, sizeof(glyphSize)); > + { > + recordreplay::AutoPassThroughThreadEvents pt; > + FontDataProc(data, size, index, glyphSize, this); > + } Not sure how you'd do all this, some kind of interface presumably.
Attachment #8790892 - Flags: review?(bas) → review-
Comment on attachment 8800339 [details] [diff] [review] Part 10j - Child side of DrawTarget management. Review of attachment 8800339 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/ipc/ChildGraphics.cpp @@ +48,5 @@ > + DrawTargetInfo() { PodZero(this); } > + > + void Clear() { > + if (mOwnsData) { > + free(mData); This should be a delete [] and not a free. ::: toolkit/recordreplay/ipc/ChildGraphics.h @@ +57,5 @@ > +// this method, the contents of the data used by this draw target will reflect > +// drawing commands issued so far. > +void ReplayFlushDrawTarget(gfx::DrawTarget* aTarget); > + > +// Flush a draw target and make a fresh copy of its data. This should have a comment that the pointer pointer to by aData should be freed by calling delete[] on it. (From a module using the same new/delete operator overloads as this one, but hopefully within Gecko that's sort of guaranteed as we depend on that in a lot of places).
Attachment #8800339 - Flags: review?(bas) → review+
Comment on attachment 8800342 [details] [diff] [review] Part 14f - Add DrawTargetRecordReplay. Review of attachment 8800342 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/recordreplay/DrawTargetRecordReplay.cpp @@ +244,5 @@ > + } > + > + ~SourceSurfaceRecordReplay() > + { > + free(mData); Needs to be delete []. @@ +543,5 @@ > + AutoPassThroughThreadEvents pt; > + RefPtr<SourceSurface> surf = aDT->Snapshot(); > + RefPtr<DataSourceSurface> dataSurf = surf->GetDataSurface(); > + DataSourceSurface::ScopedMap map(dataSurf, DataSourceSurface::READ); > + aStride = map.GetStride(); Note that 'technically' aStride is a 'per map' value. In your case you might be okay, however more importantly there's no guarantee the stride of a source surface is equal to that of the underlying DrawTarget, especially when using D2D. However, it should always be > BytesPerPixel(Format) * width, so you -should- be alright considering your code.
Attachment #8800342 - Flags: review?(bas) → review+
Comment on attachment 8790872 [details] [diff] [review] Part 11 - C++ JS debugger changes. Forwarding review to billm -- I don't understand enough to review these, but I think the answers are in parts you've already reviewed.
Attachment #8790872 - Flags: review?(jorendorff) → review?(wmccloskey)
Attachment #8790873 - Flags: review?(jorendorff) → review?(wmccloskey)
Comment on attachment 8790872 [details] [diff] [review] Part 11 - C++ JS debugger changes. Sorry, there's no way I'll ever get to this.
Attachment #8790872 - Flags: review?(wmccloskey)
Depends on: 1422587
I've restarted work on Web Replay. I'll be continuing to develop the technology rather than trying to land it, so I opened bug 1422587 for this new work.
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/385ff3d8137a Part 5e - Don't assume that CFBundleGetFunctionPointerForName succeeds, r=benwa.
Web Replay is finally ready to land! A fair number of the patches in this bug are still relevant and will be landing as well. I'll be dividing things into multiple pushes.
Whiteboard: leave-open
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/10253e262c7f Part 2a - Atomics interface changes, r=waldo. https://hg.mozilla.org/integration/mozilla-inbound/rev/e5393e123ee2 Part 2b - Don't record activity in atomics unit tests, r=waldo. https://hg.mozilla.org/integration/mozilla-inbound/rev/20f8e61b78f5 Part 4a - Make recording optional in mozilla::RefCounted, r=ehsan. https://hg.mozilla.org/integration/mozilla-inbound/rev/ca20a3fbb10b Part 4b - Make recording optional in mozilla mutexes and monitors, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/8ef3895c248e Part 4e - Don't record various JS atomics, r=jandem. https://hg.mozilla.org/integration/mozilla-inbound/rev/99479560fb28 Part 4f - Don't record JS mutexes, r=fitzgen. https://hg.mozilla.org/integration/mozilla-inbound/rev/44b62ee96921 Part 4h - Don't record chaos mode counters, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/f007143a96eb Part 4k - Don't record deadlock detector lock, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/a64bcb2b5c4d Part 4l - Don't record some debugging/statistics atomics, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/51c0852ddbf6 Part 4m - Don't record some threading atomics, r=froydnj.
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9042673fb235 Part 1b - Add libudis86 source, r=jandem. https://hg.mozilla.org/integration/mozilla-inbound/rev/92fc9eca3135 Part 10g - Initialize replay-specific and middleman state, r=billm.
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/34cf09869600 Part 1e - Disable crash reporting when recording/replaying, r=billm. https://hg.mozilla.org/integration/mozilla-inbound/rev/eec76ff04ff9 Part 5a - Disable incremental GC when recording or replaying, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/1f2c6099f852 Part 5b - Don't keep track of times or page fault counts in GC and helper thread activity when recording or replaying, r=sfink. https://hg.mozilla.org/integration/mozilla-inbound/rev/ab179dae314c Part 5c - Don't dispatch runnables for GC or finalization when under the GC and recording or replaying, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/6771fa9888f1 Part 5d - Disable compacting GC when replaying, r=jonco. https://hg.mozilla.org/integration/mozilla-inbound/rev/2af83b92e9c8 Part 5g - Disable finalization witnesses when recording or replaying, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/be1ad4d1217e Part 5i - Disable lazy and off thread JS parsing when recording or replaying, r=jandem. https://hg.mozilla.org/integration/mozilla-inbound/rev/7ff86265c080 Part 5j - Don't add GC events to timelines when recording or replaying, r=mccr8. https://hg.mozilla.org/integration/mozilla-inbound/rev/ab53a96c3b30 Part 5k - Don't generate debugger runnables on GC events, r=fitzgen. https://hg.mozilla.org/integration/mozilla-inbound/rev/0649e9060851 Part 5l - Don't trace refcounts while recording or replaying, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/a926716f38c4 Part 5n - Don't perform telemetry while recording or replaying, r=gfritzsche. https://hg.mozilla.org/integration/mozilla-inbound/rev/c0db8c1e5050 Part 6a - Disable media elements when recording or replaying, r=jesup. https://hg.mozilla.org/integration/mozilla-inbound/rev/ff3373e34204 Part 6c - Disable accelerated canvases when recording or replaying, r=dvander. https://hg.mozilla.org/integration/mozilla-inbound/rev/d558b836552b Part 6d - Disable wasm signal handlers when recording or replaying, r=luke. https://hg.mozilla.org/integration/mozilla-inbound/rev/ab75f0522bcc Part 6e - Disable the slow script dialog when recording or replaying, r=mrbkap. https://hg.mozilla.org/integration/mozilla-inbound/rev/67736c575b34 Part 7 - Ensure deterministic interaction of GC with CC and object references, r=smaug. https://hg.mozilla.org/integration/mozilla-inbound/rev/d9bbebacecd6 Part 8c - Mark places in the JS engine where recording events are disallowed and where the recording should be invalidated, r=jandem. https://hg.mozilla.org/integration/mozilla-inbound/rev/1e146aebbcc6 Part 8f - Ensure that PL and PLD hashtables have consistent iteration order when recording/replaying, r=froydnj. https://hg.mozilla.org/integration/mozilla-inbound/rev/70c285e729d9 Part 10f - Coordinate with snapshot mechanism in JS helper threads, r=fitzgen.
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/fa06b0a09780 Part 16 - Server side devtools changes, r=jimb.
All parts from this bug, and all parts from bug 1422587 excepting tests have landed and should be in tomorrow's nightly (Mac only for now). Features are turned on via the devtools.recordreplay.enabled pref (default to false for now) and accessed via the 'Tools -> Web Developer' menu. Things might not be super stable yet.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Depends on: 1489449
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: