Closed Bug 1448607 Opened 6 years ago Closed 6 years ago

Web Replay: Assorted fixes

Categories

(Core Graveyard :: Web Replay, enhancement)

enhancement
Not set
normal

Tracking

(firefox61 affected)

RESOLVED FIXED
Tracking Status
firefox61 --- affected

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(41 files)

7.92 KB, patch
Details | Diff | Splinter Review
16.86 KB, patch
Details | Diff | Splinter Review
19.27 KB, patch
Details | Diff | Splinter Review
6.01 KB, patch
Details | Diff | Splinter Review
3.78 KB, patch
Details | Diff | Splinter Review
2.63 KB, patch
Details | Diff | Splinter Review
9.33 KB, patch
Details | Diff | Splinter Review
53.75 KB, patch
Details | Diff | Splinter Review
7.48 KB, patch
Details | Diff | Splinter Review
14.17 KB, patch
Details | Diff | Splinter Review
1.45 KB, patch
Details | Diff | Splinter Review
32.38 KB, patch
Details | Diff | Splinter Review
1.52 KB, patch
Details | Diff | Splinter Review
6.78 KB, patch
Details | Diff | Splinter Review
7.32 KB, patch
Details | Diff | Splinter Review
10.25 KB, patch
Details | Diff | Splinter Review
26.91 KB, patch
Details | Diff | Splinter Review
8.13 KB, patch
Details | Diff | Splinter Review
19.17 KB, patch
Details | Diff | Splinter Review
1.15 KB, patch
Details | Diff | Splinter Review
11.15 KB, patch
Details | Diff | Splinter Review
3.98 KB, patch
Details | Diff | Splinter Review
7.89 KB, patch
Details | Diff | Splinter Review
11.02 KB, patch
Details | Diff | Splinter Review
5.65 KB, patch
Details | Diff | Splinter Review
115.92 KB, patch
Details | Diff | Splinter Review
5.89 KB, patch
Details | Diff | Splinter Review
32.92 KB, patch
Details | Diff | Splinter Review
15.88 KB, patch
Details | Diff | Splinter Review
3.00 KB, patch
Details | Diff | Splinter Review
247.67 KB, patch
Details | Diff | Splinter Review
121.83 KB, patch
Details | Diff | Splinter Review
197.41 KB, patch
Details | Diff | Splinter Review
17.27 KB, patch
Details | Diff | Splinter Review
32.42 KB, patch
Details | Diff | Splinter Review
23.25 KB, patch
Details | Diff | Splinter Review
2.42 KB, patch
Details | Diff | Splinter Review
293.03 KB, patch
Details | Diff | Splinter Review
3.63 KB, patch
Details | Diff | Splinter Review
99.59 KB, patch
Details | Diff | Splinter Review
57.66 KB, patch
Details | Diff | Splinter Review
This bug is for minor fixes and improvements related to Web Replay.  I've been filing separate bugs for different areas of improvement, but there is overlap between those areas so I think this has been making development harder to follow, not easier.
After bug 1438678 preferences are now conveyed to child processes using a shared memory block.  This doesn't work in the presence of a middleman --- the middleman can use the shmem but not the child process --- so this patch adds some logic to pass values for preferences through to the child process via the IPC used when setting up the child process.

https://hg.mozilla.org/projects/ash/rev/2b244ab90b7f1a22fcaf899b4637cd3d7e78ed4a
Assignee: nobody → bhackett1024
Now that the old gecko style code has been removed we can't continue using it as a crutch in lieu of getting things working with stylo.  Fix some atomics in the crossbeam-deque rust crate so that this work stealing queue behaves the same while recording and replaying (otherwise the stylo worker threads behave differently), and tidy up the interface used for calling into web replay from rust code to use the normal FFI instead of a goofy hack based on the write() system call.  With this patch (and some existing changes to atomics in rayon-core) stylo seems to record/replay consistently, at least on the simple websites I've tried.

https://hg.mozilla.org/projects/ash/rev/ae14af2ea100f10c1609c0e634433e03db6c366f
Fix a bug where we weren't setting up the debugger right to consistently call the handler for OnPop and EnterFrame breakpoints.  This also improves spew output when using the debugger and the RECORD_REPLAY_SPEW env var.
Fix some recording bugs that weren't showing up in automated tests but were causing crashes and other undesirable behavior.

https://hg.mozilla.org/projects/ash/rev/a5f42db051530c901ae3b1e4201537afd733bbda
Comment on attachment 8962111 [details] [diff] [review]
Part 2 - Record/replay work stealing queue behavior, tidy up Rust/RR interface.

Review of attachment 8962111 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/recordreplay/rayon-core/src/registry.rs
@@ +529,2 @@
>          if !self.breadth_first {
> +            RecordReplayRust_Assert("WorkerThread::take_local_job #2\0".as_ptr());

Totally drive-by (sorry!), but since rayon is a vendored crate which I doubt would like to take this patch upstream (unless a more general API is created for this, which you may or may not want to work on), I think it'd make sense to just disable parallelism when recording / replaying here:

  https://searchfox.org/mozilla-central/rev/8220783953b0311e1d64c2366f732a159f05ed7e/servo/components/style/gecko/global_style_data.rs#84

Or here:

  https://searchfox.org/mozilla-central/rev/8220783953b0311e1d64c2366f732a159f05ed7e/layout/style/ServoStyleSet.cpp#1563

That may be easier to land, unless you have a case for replaying some rust code that isn't style / can't be made non-parallel easily...
Thanks for the tip!  The rust changes are pretty provisional and definitely not in a landable shape (see also part 3b in bug 1422587).  Eventually I want to support running rust code in parallel, but I don't know yet whether that is better done by upstreaming changes (which may be involved --- I think we'd need a replayableAtomic crate or something and a dependency from rayon-core, crossbeam-deque, and any other vendored crates that need atomic instrumentation) or hooking entry points in these crates (which seems feasible, but I haven't investigated yet).  It's great to know though that Stylo can be forced to run in a single threaded fashion, that is definitely the simplest and most expedient path for when it comes time to land.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #6)
> Comment on attachment 8962111 [details] [diff] [review]
> Part 2 - Record/replay work stealing queue behavior, tidy up Rust/RR
> interface.
> 
> Review of attachment 8962111 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/recordreplay/rayon-core/src/registry.rs
> @@ +529,2 @@
> >          if !self.breadth_first {
> > +            RecordReplayRust_Assert("WorkerThread::take_local_job #2\0".as_ptr());
> 
> Totally drive-by (sorry!), but since rayon is a vendored crate which I doubt
> would like to take this patch upstream (unless a more general API is created
> for this, which you may or may not want to work on), I think it'd make sense
> to just disable parallelism when recording / replaying here:
> 
>  
> https://searchfox.org/mozilla-central/rev/
> 8220783953b0311e1d64c2366f732a159f05ed7e/servo/components/style/gecko/
> global_style_data.rs#84
> 
> Or here:
> 
>  
> https://searchfox.org/mozilla-central/rev/
> 8220783953b0311e1d64c2366f732a159f05ed7e/layout/style/ServoStyleSet.cpp#1563
> 
> That may be easier to land, unless you have a case for replaying some rust
> code that isn't style / can't be made non-parallel easily...
Add redirections for some APIs called by OSPreferences_mac.cpp which we were missing.
Graphics data buffer strides were not being computed correctly.
This removes some redundant code between the main record/replay stuff and the debugger.

https://hg.mozilla.org/projects/ash/rev/8f82c9c275b9
This removes all the changes made to NSPR and NSS in the past.  Changing NSPR initialization is no longer necessary because NSPR is not initialized until the main() routine starts, and so the record/replay system can be initialized with a normal function call.  This also allows removing the crufty system of passing record/replay initialization settings via environment variables rather than command line flags.  The changes for recording/replaying atomic accesses within NSS might still be important, but I haven't seen problems with removing these so far and if they are an issue in the future it would be better to find a solution that doesn't involve modifying NSPR or NSS (see bug 1303779 and bug 1303785).
When rewinding is disabled there isn't a reason to have more than one child process, so this makes the associated changes.  The main reasons to disable rewinding are to speed things up when recording or replaying, or to debug C++ code in a replaying process with lldb.

https://hg.mozilla.org/projects/ash/rev/e7499938a0a5
Recording processes can't be inspected via eval() or other methods that have side effects.  When a child process is paused at a breakpoint the debugger needs to be able to do this, and previously it would get back errors like 'Unhandled recording divergence in blah blah' when trying to do these simple things.  The debugger should get the same behavior regardless of whether it is paused at a breakpoint at the current point of the recording process, or has rewound to earlier places (where a replaying process will be able to do the eval()), so this patch switches the active child to a replaying process when the recording process hits a breakpoint.  This makes things a little slower when hitting the breakpoint but on the simple pages tested so far hasn't been a problem (the replaying children should be pretty current with the point of the recording child and will be able to catch up quickly).

https://hg.mozilla.org/projects/ash/rev/70d28bbbfbce
Avoid printing some redirection gunk when record/replay spew is disabled.

https://hg.mozilla.org/projects/ash/rev/1a4097398b73
Previously the debugger in a recording/replaying process would respond to findScripts queries with all the scripts currently in existence, but would not notify the debugger in the middleman about newly created scripts, so that if the debugger had a breakpoint on a script that didn't exist when it attached then it would not get installed as execution went forward.  This adds a special type of breakpoint for notifying the middleman about points where new scripts have been created, and also modifies the debugger server to query newly installed scripts even for sources it already has seen, in case we rewind and then try to install a breakpoint on a source whose scripts don't exist yet.

https://hg.mozilla.org/projects/ash/rev/591fc2679dcc
When switching to a recording/replaying tab from another tab, the recording/replaying tab would not update its graphics immediately and would end up giving a spinner.  This patch changes things to listen to the IPDL message for re-rendering layers after switching to a tab and responding appropriately.

https://hg.mozilla.org/projects/ash/rev/2caf8a172b27
Environment variables and command line arguments were not being included in the recording, leading to problems when recording/replaying executions with different machines/configurations, e.g. separately replaying a recording produced by a mochitest.

https://hg.mozilla.org/projects/ash/rev/827beabce512
The temporary checkpoints used when navigating around breakpoints in a replaying process can't be saved when rewinding is not allowed, so treat replaying processes like a recording process in such a situation.

https://hg.mozilla.org/projects/ash/rev/872023894bb2
By keeping track of not just whether a child process is paused but the kind of place it is paused at (checkpoint, breakpoint, recording endpoint) we can more cleanly handle the replay->record transition when running forward past the recording endpoint and avoid some reentrancy issues.

https://hg.mozilla.org/projects/ash/rev/2505712ef088
When reverse-stepping past the beginning of a frame we wouldn't do the expected thing of stepping out to the calling frame, but would instead continue travelling back in time until another call to the same function was found.  This was kind of hard to fix as is; the fundamental problem is when setting an onStep breakpoint in a frame the replaying process would decide the exact opcodes which it should stop at, and when reverse-stepping out it would not pick the right opcode.  This patch removes the code the replaying process uses to determine where to stop, and instead requires the debugger to specify the offsets to stop at when setting an onStep handler on a recording/replaying frame.  This allows the server to use code shared with the normal onStep handler for figuring out where to stop, and allows for straightforwardly making reverse-stepping behave as it should.

https://hg.mozilla.org/projects/ash/rev/cc6ceec36386
Recording/replaying processes create a bunch of stuff in /tmp for temporary recording files, and stuff saved by memory/thread snapshots for use in rewinding.  Previously this stuff would just be left around after the process was killed; this patch changes things so that these files will be cleaned up when the process exits.  This also fixes a bug where middleman processes could be saved for use as spare children by the UI process and potentially used for new non-recording/replaying tabs.

https://hg.mozilla.org/projects/ash/rev/2bf36a3eeff6
Previously the code for keeping track of lock acquire orders could emulate the behavior of locks and cvars while recording.  This was put in to allow saving/restoring checkpoints in a recording process while threads were blocked while waiting on a lock or cvar, but since recording processes can no longer save/restore checkpoints this isn't necessary and this complicated code can be removed.  This is especially nice to fix because pthread_cond_timedwait and related functions were emulated with a busy-wait, and the recording child would saturate one or more cores even when it was idle or was paused while a replaying child ran.

https://hg.mozilla.org/projects/ash/rev/025cbb25ba02
Sometimes a child process would hang instead of properly reporting a crash to the middleman.

https://hg.mozilla.org/projects/ash/rev/6ef6dcb0e7ae
The combination of parts 10 and 12 could cause a recording process to hit a lot of NewScript breakpoints when loading a complicated page, and the cost of switching to a replaying process and then back to the recording process on each one would cause quite a bit of jank.  This patch sticks with the recording child as the active one when hitting a NewScript breakpoint (the debugger shouldn't be doing any evals or anything here anyways).

https://hg.mozilla.org/projects/ash/rev/bdb5c666bdd6
A terminated hung/crashed child might not be totally dead (especially if it hasn't really hung) and could continue sending confusing messages to the middleman.

https://hg.mozilla.org/projects/ash/rev/e1dfdf07e24a
Add some missing redirections and record/replay assertions to fix/diagnose some reported crashes.

https://hg.mozilla.org/projects/ash/rev/491699a948fd
Allow recording processes and replaying processes that can't rewind to report crashes to the middleman, and for the middleman to detect hangs in such processes.  The middleman can't do anything with these other than report them to the UI process (where they show up as an alert), but this should help with diagnosing stalls when recording pages.  This also fixes a couple of things with the cycle collector / GC interface which were exposed by this patch.

https://hg.mozilla.org/projects/ash/rev/bab17066d637
Creating or resolving promises would query the current time if they were being debugged, leading to recording event mismatches when replaying.  This patch fixes things so they always query the current time, and also removes a record/replay assertion that was causing problems while testing.

https://hg.mozilla.org/projects/ash/rev/0c74c0ed0fc4016e3d82266a2f2085f928300124
Up until now the replaying process debugger kept track of the exact points of execution (particular breakpoint hits, etc.) by keeping track of the order in which various script locations had been hit when executing from the last checkpoint.  This adds a lot of complexity to the debugger as it tries to return to particular execution points, and doesn't scale well when there are many such script locations in the ordering (e.g. set a breakpoint on the millionth iteration of a loop's execution).  This patch modifies the JS engine to maintain a global progress counter when recording or replaying, which is incremented when entering a script or executing a LOOPENTRY opcode and maintains the same value regardless of how the code has been JIT compiled.  Since no script location can be reached twice with the same value for the progress counter, this greatly simplifies representation of execution points and the associated debugger code.

https://hg.mozilla.org/projects/ash/rev/b340de499db99d1676483ed1d1fdcc8f79d8ef5b
This patch has some fixes so that typing in <textarea> elements works when recording/replaying.  A few new redirections were needed, and some modifications to PRemoteSpellCheckEngine to not use raw pointers as IPDL identifiers.

https://hg.mozilla.org/projects/ash/rev/bece93a9d018
This patch adds --save-recordings options to firefox and mochitests, allowing all content processes created while running to be recorded, and that recording saved into a specified directory when the content process finishes up.  This also has a fair number of fixes to get the first browser mochitest working --- some robustness improvements to IPDL message forwarding, unloading layers when asked so that tab switches are detected properly, and a few intermittent failures introduced by a recent merge.

https://hg.mozilla.org/projects/ash/rev/16ff386e9384
Previously, graphics data was sent from child processes to the middleman via the socket pair used by other child-middleman messages.  This turned out (not surprisingly) to be really slow, >20ms to send one frame of data.  This patch sets up a shared memory block that is shared between the middleman and all its children, and is used to send the actual rendered data.  Scrolling and animations feel more responsive now, but still a little choppy --- the UI process compositor doesn't seem to be rendering these frames fast enough, I'll look into what's going on there soon.
Composites are scheduled differently from normal when recording or replaying, at the top of the main thread's event loop rather than via the PVsync protocol.  This interacts poorly with the code that schedules composites when an animation is playing, causing composites to be scheduled over and over (sometimes thousands of them) without any actual painting occurring.  I'm not sure what the best fix for this is, but this patch disables the eager scheduling of composites when animating while recording or replaying, avoiding this problem and keeping the page responsive.

https://hg.mozilla.org/projects/ash/rev/afbec691ce457e2c2b75278cc09f65ff00fac408
This patch removes all the changes to rayon and other upstream rust packages which were added to record/replay the atomics used by those packages.  Instead, the rayon thread pool is not created when recording/replaying, as suggested in comment 6.

https://hg.mozilla.org/projects/ash/rev/65a41db1339a9b7298d205205ccf2e09f941dfdb
Attachment #8980886 - Attachment is patch: true
Remove all the record/replay assertions added to various parts of Gecko.  These are useful for diagnosing problems but as preparation for landing they don't really belong in the codebase.

https://hg.mozilla.org/projects/ash/rev/70c30cef280ddeed0ffd8be9a62df89753a239d5
This patch removes all the Windows redirections and associated code.  While I want to revisit this and get Web Replay working on Windows, that won't happen for a while and there isn't much point in keeping all this bitrotted code in the tree.

https://hg.mozilla.org/projects/ash/rev/2bdd47c924d322c98605f1f7a3f92486470149da
The plan for the (second) initial landing attempt is to only enable Web Replay on Mac nightlies.  This patch makes that change and makes API functions into no-ops on other configurations (e.g. IsRecordingOrReplaying() just returns false).

https://hg.mozilla.org/projects/ash/rev/079566e96397a477330e4422bf09a4062345ba33
Over time some crufty changes have accumulated in other parts of Gecko.  This patch fixes these either by removing them or by adding comments to describe why they are necessary.

https://hg.mozilla.org/projects/ash/rev/1262f537e29f39784663f9686f4b145d8aa1a3d3
More cleanup.  This makes ash line up with the series of patches I've been posting for review to bugs under bug 1422587.  It will be those patches that eventually land on m-c, not the changesets on ash.

https://hg.mozilla.org/projects/ash/rev/cf468c71620e4f4da9a2f7cff53cd6ed8a974478
The replay debugger method for getting the symbol properties on an object was broken and returned non-symbol properties as well, causing duplicate properties when inspecting objects in the debugger.

https://hg.mozilla.org/projects/ash/rev/aebb5db41acd185da4a07b19597c1125528e91cc
Bring things in sync with the changes made for reviews of patches dependent on bug 1422587.

https://hg.mozilla.org/projects/ash/rev/47b183640006
Fix a bug introduced by some of the changes made for sandboxing.
This patch adds several new features:

- When using the web console on a recording/replaying tab, console messages and errors will be shown.  The console will show all known messages, including those from the future after the process has rewound.  (It would be nice to distinguish those which haven't occurred yet in the UI but that hasn't been done yet.)

- Right clicking on a console message / error gives a new option to warp to the point where either the message was generated, or where the error object was originally created.

- Rewinding to the beginning of a recording, or playing forward to the end when there is no recording process, will place the debugger UI in a paused state rather than leaving it in its running state even though the child process cannot do anything.

https://hg.mozilla.org/projects/ash/rev/1a73b46e4e21
Component: General → Web Replay
Besides work applying and fixing up patches per reviews, I've started work again on recording/replaying mochitests.  This patch has fixes to get the accessibility parts of the browser mochitests (50 odd test files) passing: recording all content processes started by the test without test failures or other errors, and then replaying all those recordings.

https://hg.mozilla.org/projects/ash/rev/26adb2c44ed0d2abc269adf5182e9214152bb750
Web Replay has landed to inbound and the ash repository will be retired pretty soon.  I'll be filing separate bugs for all the stuff here that hasn't landed to inbound.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: