Closed
Bug 1445219
Opened 7 years ago
Closed 7 years ago
Web Replay: Restore replaying process after crashes and hangs
Categories
(Core :: General, defect)
Core
General
Tracking
()
RESOLVED
INVALID
Tracking | Status | |
---|---|---|
firefox61 | --- | affected |
People
(Reporter: bhackett1024, Assigned: bhackett1024)
References
Details
Attachments
(8 files)
18.91 KB,
patch
|
Details | Diff | Splinter Review | |
11.54 KB,
patch
|
Details | Diff | Splinter Review | |
31.10 KB,
patch
|
Details | Diff | Splinter Review | |
1.48 KB,
patch
|
Details | Diff | Splinter Review | |
7.21 KB,
patch
|
Details | Diff | Splinter Review | |
30.80 KB,
patch
|
Details | Diff | Splinter Review | |
5.04 KB,
patch
|
Details | Diff | Splinter Review | |
3.61 KB,
patch
|
Details | Diff | Splinter Review |
There are several intermittent crashes and hangs I've seen while using Web Replay. While of course it would be nice if these never happened, and while I will try to track down their sources, it would also be nice if these didn't have a big impact on people using the tool.
The middleman process knows where the replaying process is and what it's doing, and it would be cool if the middleman could detect that the replaying process has crashed or hung and then spin up a new replaying process and direct it to where the last one was at. This would all happen transparently to the user, other than the time spent waiting while the process gets back to where it was. (Eventually this recovery could be exposed in the UI, and there is already a similar need in showing the user progress when rewinding, but that will be left for the future.)
Assignee | ||
Comment 1•7 years ago
|
||
Remove the existing last-ditch restore infrastructure. This was used when a mismatch with the recording is detected while replaying --- we would rewind to the last snapshot and try again, hopefully avoiding the problem. It wasn't enabled though and had rotted, and the approach in this bug is considerably more robust against the problems that crop up in practice.
Assignee: nobody → bhackett1024
Assignee | ||
Comment 2•7 years ago
|
||
Previously GeckoChildProcessHost used environment variables to determine what type of recording/replaying/middleman process to spawn. Not only is this really gross, it also makes it hard to have a middleman spawn multiple processes of different kinds at different times. This patch makes the spawned process type an explicit parameter to the functions that spawn processes.
Assignee | ||
Comment 3•7 years ago
|
||
This patch changes breakpoint sets to happen via special IPC messages between the middleman and recording/replaying processes, rather than folding them into the DebuggerRequest JSON messages sent for other debugger actions. This is nice because recovery of a crashed process needs to treat changes to breakpoints differently from other messages, and it also allows some nice cleanups and simplifications in the code for managing debugger requests (breakpoint operations already had some special casing going on).
Assignee | ||
Comment 4•7 years ago
|
||
Fix a source of unhandled recording divergences which was causing unexpected (though not incorrect) behavior while testing.
Assignee | ||
Comment 5•7 years ago
|
||
We need to be able to provoke crashes in the replaying process in order to have automated testing for child process recovery.
Assignee | ||
Comment 6•7 years ago
|
||
Here are the main changes to the infrastructure needed so that the middleman process can spin up a new replaying process and take it to the same place where the last replaying process crashed. If the second replaying process crashes before getting to that point then we report the error to the user and give up, otherwise the second process seamlessly takes over for future debugging.
Assignee | ||
Comment 7•7 years ago
|
||
Assignee | ||
Comment 8•7 years ago
|
||
https://hg.mozilla.org/projects/ash/rev/04eb48072173246aa5802d27095b86378bdb799f
https://hg.mozilla.org/projects/ash/rev/be70121487bdec055dc650cb237a35b875caf28a
https://hg.mozilla.org/projects/ash/rev/c6c63a1763eaf086c0f7b6e0823603b35daeed60
https://hg.mozilla.org/projects/ash/rev/51460e01e0a618bafed775ba6d1a3ffce878fb92
https://hg.mozilla.org/projects/ash/rev/7c0d410e49fc155155b918042eb4c0159382a451
https://hg.mozilla.org/projects/ash/rev/01729d661d3e448ab8cc2a06db67ef7519a73f64
https://hg.mozilla.org/projects/ash/rev/50280c3f8e3b36d066195c19498aebcf0c833a16
Assignee | ||
Comment 9•7 years ago
|
||
This patch fixes an issue I saw where if a process is resumed, but hangs, we don't try to detect the hang until after trying to interact with the process in some way (pressing the pause button, etc.). Now, we start monitoring for a hang as soon as the process is resumed.
https://hg.mozilla.org/projects/ash/rev/51a0586397b6fa9297c78e35fb7107c6e634f0c0
Assignee | ||
Comment 10•7 years ago
|
||
Closing this bug, all the changes here will be reviewed in separate bugs dependent on bug 1422587.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•