Closed Bug 1433495 Opened 6 years ago Closed 6 months ago

Save minidump files if MINIDUMP_SAVE_PATH is set to preserve crash dumps

Categories

(Testing :: geckodriver, enhancement, P2)

Version 3
enhancement
Points:
1

Tracking

(firefox120 fixed)

RESOLVED FIXED
120 Branch
Tracking Status
firefox120 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [webdriver:m9])

Attachments

(1 file, 9 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
In case of mozcrash for Python there is MINIDUMP_SAVE_PATH support, which basically means that the minidump files as created by Firefox will be saved off to a given folder before the profile gets removed.

As long as we cannot (want to) handle crash analysis, we should at least offer a way to users to safe their minidump files. This is not possible for now, and causes a bit of pain for me to get investigated.

Should be a trivial addition, and would be good to have for the 0.20 release.
Summary: Safe minidump files if MINIDUMP_SAVE_PATH is set → Save minidump files if MINIDUMP_SAVE_PATH is set
Priority: -- → P3
Blocks: 1441204
No longer blocks: 1401129
Comment on attachment 8954393 [details]
Bug 1433495 - Save minidump files if MINIDUMP_SAVE_PATH is set.

https://reviewboard.mozilla.org/r/223478/#review229584

::: testing/mozbase/rust/mozrunner/Cargo.toml:12
(Diff revision 1)
>  license = "MPL-2.0"
>  
>  [dependencies]
>  log = "0.4"
>  mozprofile = { path = "../mozprofile" }
> +walkdir = "2"

Remember to run "./mach vendor rust" so it gets put in third_party/rust.

::: testing/mozbase/rust/mozrunner/src/minidump.rs:6
(Diff revision 1)
> +use std::env;
> +use std::fs;
> +use std::path::PathBuf;
> +use walkdir::WalkDir;
> +
> +pub fn save_minidump_files(path: &PathBuf) -> () {

This code looks ok-ish, but I’d prefer a slightly different API.

First of all I think it would be better if you split this function in
two: one part providing an iterator over minidump files given a Profile,
the other consuming the minidumps and copying them to a given location.

It is also possible the iterator part should live in mozprofile?

::: testing/mozbase/rust/mozrunner/src/minidump.rs:10
(Diff revision 1)
> +
> +pub fn save_minidump_files(path: &PathBuf) -> () {
> +    // Until we can analyze crash reports similar to mozcrash
> +    // lets save available minidump files to a safe location
> +    // if the MINIDUMP_SAVE_PATH env variable has been set.
> +    if let Ok(val) = env::var("MINIDUMP_SAVE_PATH") {

I think probably this should live in the Drop implementation,
and that the function to save minidumps should take a profile
(with an iterator) and a destination path instead.

::: testing/mozbase/rust/mozrunner/src/minidump.rs:15
(Diff revision 1)
> +            println!("MINIDUMP_SAVE_PATH set, but {:?} is not a valid path", save_path);
> +            return;

Return Err here.

::: testing/mozbase/rust/mozrunner/src/minidump.rs:26
(Diff revision 1)
> +                match fs::copy(entry.path(), &save_path) {
> +                    Ok(_) => {
> +                        println!("Saved {:?} as {:?}", entry.file_name(), save_path);
> +                    }
> +                    _ => {}
> +                }

Can we have this function return Result, and do away with the match stataement?

::: testing/mozbase/rust/mozrunner/src/minidump.rs:28
(Diff revision 1)
> +        for entry in walker.filter_map(|e| e.ok()) {
> +            if entry.path().is_file() {
> +                save_path.set_file_name(entry.file_name());
> +                match fs::copy(entry.path(), &save_path) {
> +                    Ok(_) => {
> +                        println!("Saved {:?} as {:?}", entry.file_name(), save_path);

Let’s use the info!() macro for this.
Attachment #8954393 - Flags: review?(ato) → review-
Comment on attachment 8954393 [details]
Bug 1433495 - Save minidump files if MINIDUMP_SAVE_PATH is set.

https://reviewboard.mozilla.org/r/223478/#review229584

> Remember to run "./mach vendor rust" so it gets put in third_party/rust.

We don't have to because it is already in the tree. But I see that this crate is at version 1.0.7 only. We may have to check which other crates depend on it, or if we can just use that version.

> This code looks ok-ish, but I’d prefer a slightly different API.
> 
> First of all I think it would be better if you split this function in
> two: one part providing an iterator over minidump files given a Profile,
> the other consuming the minidumps and copying them to a given location.
> 
> It is also possible the iterator part should live in mozprofile?

The changes here are actually more as what I can do today. Andreas, if you want to pick this up please do it. I would appreciate. I expected that the API is not that great, because I just wanted to get something working.

> Return Err here.

Note that the caller might want to silently ignore the error. We should make this a hard failure.

> Can we have this function return Result, and do away with the match stataement?

If you do a refactoring and this method copies only a single file then this will be fine. With the current implementation we would abort the copy process in the middle and leaving other minidump files behind.
So, post our discussion, I think that the right approach here is to make sure `delete_session` in the handler is called before shutdown and make the marionette implemenation of `WebDriverHandler` do the cleanup in `delete_session`. For the case that the session is explicitly closed that will work. For the case that there's a browser crash and some command fails that will work. The case for which that won't work is if geckodriver is terminated with a signal with an active session (in this case it would olny matter if the browser had crashed after the last command but before termination).

In theory the way we want to handle that case is to install a signal handler that responds to SIGTERM by running the cleanup steps and then shutting down the server. The chan-signal crate seems like the best current option for doing this, and with a small refactor I was able to dispatch a Quit message on ctrl-c and have that run the `delete_session` code. Unfortunately there are two problems:

* It doesn't work on Windows
* It isn't actually possible to shut down hyper 0.10 [1]

So I think to make this work in all cases it would be necessary to upgrade to a newer hyper, which would be a significant refactor since the new version is designed for event-based concurrency rather than the thread-per-connection approach of 0.10, which we are using in a single-threaded fashion.

In the interests of Science, I tried implementing Drop that called delete_session, and afaict that isn't actually called on a signal, so I don't think a purely Drop-based solution will work.

The conclusion here is that I don't think we can make this work in all cases at the moment without significant effort. In the meantime, I think the right thing to do for minidumps is to add a method somewhere in mozprofile that clears out the minidumps (I would even say that the method should take a destination path rather than hardcoding the environment variable), and that should be called from `delete_session`. If we then want to plug the hole in the above approach we will need to do some more fundamental structural work to enable us to shut down the web server cleanly.

[1] https://github.com/hyperium/hyper/issues/338
Summary: Save minidump files if MINIDUMP_SAVE_PATH is set → Save minidump files if MINIDUMP_SAVE_PATH is set to preserve crash dumps
See Also: → 1460307
Blocks: 1466818
No longer blocks: 1441204
Blocks: 1485259
No longer blocks: 1485259
Blocks: 1490906
No longer blocks: 1466818
I would appreciate to have this feature in geckodriver as soon as possible. Reasoning is that we have a couple of crashes of Firefox in the wdspec test suite which cannot be analyzed. As such there is nothing I can do to get the tests passing again.

Andreas, you mentioned that you have a half-baked patch locally. In the case you cannot complete it due to time constraints would you mind to at least upload this patch to this bug? In such a case I would like to pick it up. But if you could finish it, I would appreciate it. Thanks.
Flags: needinfo?(ato)
Attached patch WIP patch from Andreas (obsolete) — Splinter Review
As discussed in the meeting today, here is the WIP patch Andreas sent me early in April this year. 

Andreas, maybe you can skim over and check if that is the one you were referring to. If not lets get it obsoleted. Thanks.
Assignee: nobody → hskupin
Thanks for attaching the lost patch.  I had managed to almost
recreate it locally, and the result was surprisingly similar.

I will look at this once I’m done with my P1’s.
Assignee: hskupin → ato
Flags: needinfo?(ato)
Priority: P3 → P2
Assignee: ato → hskupin
Blocks: 1495062
No longer blocks: 1491288
As discussed during the work week Andreas wants to take care of this bug after bug 1430064 has been fixed.
Assignee: hskupin → ato
Depends on: 1430064

Moving to the 0.25 release of geckodriver for which we might be able to get it fixed.

Assignee: ato → nobody
Blocks: 1520585
No longer blocks: 1495062
No longer blocks: 1520585
No longer blocks: 1573798
Blocks: 1584911
See Also: → 1613064
Blocks: 1585036
Assignee: nobody → nalexander
Status: NEW → ASSIGNED

Part 2a was what I took from an existing ticket -- perhaps it was from
ato? Here are my changes.

Depends on D62092

This is just some noodling on how to write automated tests for this
stuff. Browsertime can run fairly arbitrary Node.js scripts to drive
the browser; my hope was to navigate to about:crash{content,parent}
and teach the harness about expected crashes, etc. That doesn't work
100% smoothly 'cuz loading about: pages isn't 100% solid in
Browsertime (maybe better now), so I use privileged JS to do it. This
patch mostly captures that.

The changes to Raptor to accommodate this type of testing... those are painful.

Depends on D62096

Nick, do you expect me to pick-up the remaining work, or do you have the time to address review comments? Note that not all the patches seem to be geckodriver only. I can see a couple which are eg Raptor related, and which should not end-up on this bug.

Flags: needinfo?(nalexander)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #17)

Nick, do you expect me to pick-up the remaining work, or do you have the time to address review comments? Note that not all the patches seem to be geckodriver only. I can see a couple which are eg Raptor related, and which should not end-up on this bug.

Yeah, I realized that the bug number was incorrect -- see https://bugzilla.mozilla.org/show_bug.cgi?id=1585036#c9. As for what I can do, see https://bugzilla.mozilla.org/show_bug.cgi?id=1585036#c8 -- I could definitely push the adb pieces across the line, and potentially the geckodriver pieces if you're mostly happy with them. If you want major changes to the geckodriver bits, I'll need others (you!) to pick them up. Let me know?

Flags: needinfo?(nalexander) → needinfo?(hskupin)
Assignee: nalexander → nobody
Status: ASSIGNED → NEW
Assignee: nobody → gmierz2
Status: NEW → ASSIGNED
Assignee: gmierz2 → nobody
Status: ASSIGNED → NEW
Attachment #9125145 - Attachment is obsolete: true
Attachment #9125146 - Attachment is obsolete: true
Attachment #9125148 - Attachment is obsolete: true
Attachment #9125151 - Attachment is obsolete: true
Attachment #9125154 - Attachment is obsolete: true
Attachment #9125155 - Attachment is obsolete: true
Assignee: nobody → gmierz2
Status: NEW → ASSIGNED
Assignee: gmierz2 → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(hskupin)

I will keep the needinfo for me to follow-up on Alexander's patches.

Flags: needinfo?(hskupin)
Blocks: 1649094
No longer blocks: 1584911
Blocks: 1668243
No longer blocks: 1649094
Flags: needinfo?(hskupin)
No longer blocks: 1668243
Blocks: 1686110
Blocks: 1723202
No longer blocks: 1686110
Depends on: 1725622

I don't see why bug 1430064 should actually block this bug from getting fixed. Removing it from the dependency list.

No longer depends on: 1430064

Sadly we have to move this bug to the next 0.32.0 release.

Blocks: 1750691
No longer blocks: 1723202
Blocks: 1794560
No longer blocks: 1750691
Severity: normal → S3
Attachment #9125156 - Attachment is obsolete: true
No longer blocks: 1794560
No longer blocks: 1814050
Blocks: 1848850

I had another quick look at this bug and here are some updates:

  1. If the user has the MINIDUMP_SAVE_PATH environment variable set when starting geckodriver it will automatically be forwarded to Firefox and Firefox itself will place the .dmp and .extra file into the given folder. There is no actual copying necessary by geckodriver.

  2. The automatic copying will not take place for Firefox on Android. Here we would have to set it to a specific folder that we can then use as source for the copy operation as done by mozdevice. Good thing is that we can do that also after the profile has been removed.

  3. We could consider to add an extra argument to geckodriver like --minidump-path which then would override the environment variable and allows a more visible usage of this feature.

Step 3 shouldn't be too much work, and step 2 is probably easy as well. With that change we could then attach minidump files as artifacts to the Treeherder wdspec jobs.

Lets discuss in the next triage meeting.

Whiteboard: [webdriver:triage]
Blocks: 1859377

As we discussed we are fine with just the environment variable. That way minidump files will be saved off to the proper location on disk. To include these as artifacts to the wdspec job, we will have to update the mozharness script to appropriately.

We should also update https://firefox-source-docs.mozilla.org/testing/geckodriver/CrashReports.html so that it also makes use of this environment variable.

For Android support which can be follow-up work I filed bug 1859377.

Whiteboard: [webdriver:triage] → [webdriver:backlog]
Attachment #9009656 - Attachment is obsolete: true
Attachment #8954393 - Attachment is obsolete: true

Actually I just had a look at the mozharness script for web-platform tests and it actually uses the MINIDUMP_SAVE_PATH environment variable and also successfully forwards it to geckodriver and Firefox. As it can be seen for this wdspec test job the minidump files are correctly uploaded as artifacts.

That means we only need to update the documentation here to allow users to grab and share minidump files.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Points: --- → 1
Whiteboard: [webdriver:backlog] → [webdriver:m9]
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/79b015309dbe
[geckodriver] Update documentation in how to retrieve crash reports from Firefox. r=webdriver-reviewers,jdescottes DONTBUILD
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 120 Branch
See Also: → 1882338
No longer blocks: 1489130, 1585036, 1859377
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: