Add support to raptor-browsertime to check for browser crashes for desktop
Categories
(Testing :: Raptor, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: rwood, Assigned: aglavic)
References
(Depends on 1 open bug, Blocks 3 open bugs)
Details
(Whiteboard: [fxp])
Attachments
(2 files)
Check for and report any browser crashes after raptor-browsertime test runs.
Reporter | ||
Updated•5 years ago
|
Comment 2•5 years ago
•
|
||
Browsewrtime uses gecko driver to drive Firefox
Because of this we use a "temp" profile that is created and deleted by the driver
We can use this strategy to save the profile and collect the crash data:
https://firefox-source-docs.mozilla.org/testing/geckodriver/CrashReports.html
Notes on JS: https://selenium.dev/selenium/docs/api/javascript/module/selenium-webdriver/firefox.html
Or force browsertime/driver to use the preexisting profile and not clone info from a template
https://github.com/mozilla/browsertime/blob/master/lib/firefox/webdriver/index.js#L52
Updated•5 years ago
|
Updated•5 years ago
|
I talked with Tarek today and we agreed that I will pick up this ticket.
Updated•5 years ago
|
Updated•5 years ago
|
(In reply to Nick Alexander :nalexander [he/him] from comment #3)
I talked with Tarek today and we agreed that I will pick up this ticket.
And now it's been deprioritized.
For whomever looks at this next, we can force crashes in various processes by using chrome JS to load URIs, like:
await commands.js.runPrivileged('window.BrowserApp.selectedTab.browser.loadURI("about:crashcontent");');
await commands.js.runPrivileged('window.BrowserApp.selectedTab.browser.loadURI("about:crashparent");');
An update on this: locally, I have geckodriver pulling minidumps off the device. I still need to wire up the mozcrash
pieces, but that looks almost trivial once the minidumps are captured.
What I don't have great ideas for is how to test this in automation. Using privileged JS (see #c4) we can trigger various crashes, but adding Raptor/Browsertime test jobs that expect crashes is pretty ugly because there are a lot of assumptions baked into the existing code about pageload. (One can see the mismatch with the way the scenario tests were wedged in, and to a lesser extent, the way the benchmark tests were implemented.) At the end of the day I will probably add a new "type" and some flags to the test descriptions to describe the expected crashes, and the whole framework will get yet more overloaded :(
For the record, we see OOM crashes with the Window Recorder under browsertime a lot. They look like this "green" job, which crashes during the second run. (I.e, doesn't write 2.mp4
.) Note the browsertime error:
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - [2020-01-31 19:02:39] VERBOSE: [browsertime.command.measure] Collecting metrics
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - [2020-01-31 19:02:39] INFO: [browsertime.video] Stop firefox window recorder.
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - [2020-01-31 19:02:39] TRACE: [browsertime] Executing privileged async script const cb = arguments[arguments.length-1];
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - Promise.resolve(
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - windowUtils.setCompositionRecording(false)
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - )
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - .then(() => cb());
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - at SeleniumRunner.runPrivilegedAsyncScript (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/lib/core/seleniumRunner.js:417:9)
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - at enableWindowRecorder (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/lib/video/screenRecording/firefox/firefoxWindowRecorder.js:31:18)
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - at FirefoxWindowRecorder.stop (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/lib/video/screenRecording/firefox/firefoxWindowRecorder.js:175:11)
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - at Video.stop (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/lib/video/video.js:69:26)
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - [2020-01-31 19:02:39] VERBOSE: [browsertime] Executing privileged async script toggle composition recorder
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - 1580497359322 webdriver::server DEBUG -> GET /session/d69b8b35-cbd1-4c86-9b50-d890a53b97e5/moz/context
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - 1580497359324 webdriver::server DEBUG <- 200 OK {"value":"content"}
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - 1580497359325 webdriver::server DEBUG -> POST /session/d69b8b35-cbd1-4c86-9b50-d890a53b97e5/moz/context {"context":"chrome"}
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - 1580497359327 webdriver::server DEBUG <- 200 OK {"value":null}
[task 2020-01-31T19:03:09.004Z] 19:02:39 INFO - 1580497359328 webdriver::server DEBUG -> POST /session/d69b8b35-cbd1-4c86-9b50-d890a53b97e5/execute/async {"script":" const cb = arguments[arguments.length-1];\n Promise.resolve(\n windowUtils.setCompositionRecording(false)\n )\n .then(() => cb());\n ","args":[null]}
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386132 webdriver::server DEBUG Deleting session
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386132 geckodriver::android DEBUG Force stopping the Android package: org.mozilla.geckoview_example
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386132 mozdevice DEBUG Force stopping Android package: org.mozilla.geckoview_example
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386132 mozdevice DEBUG execute_host_command: >> "host:transport:FA79D1A07331"
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386132 mozdevice DEBUG execute_host_command: << []
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386132 mozdevice DEBUG execute_host_command: >> "shell:am force-stop org.mozilla.geckoview_example"
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386199 mozdevice DEBUG execute_host_command: << ""
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386199 geckodriver::marionette DEBUG Android package force-stopped
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386199 mozdevice DEBUG execute_host_command: >> "host:transport:FA79D1A07331"
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386199 mozdevice DEBUG execute_host_command: << []
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386199 mozdevice DEBUG execute_host_command: >> "shell:am clear-debug-app org.mozilla.geckoview_example"
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386223 mozdevice DEBUG execute_host_command: << ""
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386223 geckodriver::android DEBUG Disabled reading from configuration file
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386224 geckodriver::android DEBUG Android port forward (44607 -> 2829) stopped
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386224 webdriver::server DEBUG <- 500 Internal Server Error {"value":{"error":"unknown error","message":"Failed to decode response from marionette","stacktrace":""}}
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - [2020-01-31 19:03:06] ERROR: [browsertime] Couldn't execute async script named toggle composition recorder error:WebDriverError: Failed to decode response from marionette
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - [2020-01-31 19:03:06] ERROR: [browsertime] WebDriverError: Failed to decode response from marionette
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - at Object.throwDecodedError (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/error.js:550:15)
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - at parseHttpResponse (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/http.js:563:13)
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - at Executor.execute (/builds/task_1580497051/fetches/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/http.js:489:26)
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - at process._tickCallback (internal/process/next_tick.js:68:7)
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386225 webdriver::server DEBUG -> DELETE /session/d69b8b35-cbd1-4c86-9b50-d890a53b97e5
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386225 webdriver::server DEBUG <- 404 Not Found {"value":{"error":"invalid session id","message":"Tried to run command without establishing a connection","stacktrace":""}}
[task 2020-01-31T19:03:09.004Z] 19:03:06 INFO - 1580497386229 geckodriver DEBUG Listening on 127.0.0.1:54237
and the adb logcat
, which I think shows the OOM killer reaping processes until the tab goes down:
01-31 19:02:39.257 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:02:39.285 6618 6634 I Gecko : 1580497359285 Marionette DEBUG 0 -> [0,18,"WebDriver:ExecuteScript",{"args":[5000],"script":"\nreturn (function(waitTime) {\n try { \n var end = wi ... () > end - start + waitTime);\n } \n catch(e) {\n return true;\n }\n})(arguments[arguments.length - 1]);\n"}]
01-31 19:02:39.290 6618 6634 I Gecko : 1580497359290 Marionette DEBUG 0 <- [1,18,null,{"value":true}]
01-31 19:02:39.293 6618 6634 I Gecko : 1580497359293 Marionette DEBUG 0 -> [0,19,"WebDriver:ExecuteScript",{"args":[],"script":"return document.documentURI;"}]
01-31 19:02:39.297 6618 6634 I Gecko : 1580497359297 Marionette DEBUG 0 <- [1,19,null,{"value":"https://www.amazon.com/"}]
01-31 19:02:39.318 6618 6634 I Gecko : 1580497359318 Marionette DEBUG 0 -> [0,20,"Marionette:GetContext",{}]
01-31 19:02:39.318 6618 6634 I Gecko : 1580497359318 Marionette DEBUG 0 <- [1,20,null,{"value":"content"}]
01-31 19:02:39.321 6618 6634 I Gecko : 1580497359321 Marionette DEBUG 0 -> [0,21,"Marionette:SetContext",{"value":"chrome"}]
01-31 19:02:39.321 6618 6634 I Gecko : 1580497359321 Marionette DEBUG 0 <- [1,21,null,{"value":null}]
01-31 19:02:39.323 6618 6634 I Gecko : 1580497359323 Marionette DEBUG 0 -> [0,22,"WebDriver:ExecuteAsyncScript",{"args":[null],"script":" const cb = arguments[arguments.length-1];\n Promise.resolve(\n windowUtils.setCompositionRecording(false)\n )\n .then(() => cb());\n "}]
01-31 19:02:39.535 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:02:40.264 806 806 I chatty : uid=1000(system) power@1.1-servi identical 5 lines
01-31 19:02:40.536 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:02:49.134 6618 6634 E GeckoConsole: [JavaScript Error: "Polling for changes failed: NetworkError when attempting to fetch resource.." {file: "resource://services-settings/remote-settings.js" line: 251}]
01-31 19:02:49.134 6618 6634 E GeckoConsole: remoteSettingsFunction/remoteSettings.pollChanges@resource://services-settings/remote-settings.js:251:13
01-31 19:02:56.575 2454 2454 E TaskServiceHelper: Cannot execute TaskName.UNKNOWN (or data got corrupted).
01-31 19:02:56.627 4584 6873 W Gcs.WfaEliRefreshTask: [327] h.b: NULL eligibility response.
01-31 19:02:56.928 1177 1556 E BatteryStatsService: no controller energy info supplied
01-31 19:02:56.953 1177 1556 E KernelUidCpuFreqTimeReader: Failed to read /proc/uid_time_in_state: java.io.FileNotFoundException: /proc/uid_time_in_state (No such file or directory)
01-31 19:02:59.241 1177 1188 D WificondControl: Scan result ready event
01-31 19:03:00.028 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:04.105 1177 1586 I ActivityManager: Process com.google.android.gms.unstable (pid 3748) has died: cch+4CEM
01-31 19:03:00.082 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:04.105 1177 1197 W zygote64: kill(-3748, 9) failed: No such process
01-31 19:03:04.105 1177 1197 I zygote64: Successfully killed process cgroup uid 10017 pid 3748 in 0ms
01-31 19:03:04.106 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:04.624 1177 1586 I ActivityManager: Process com.google.android.contacts (pid 5283) has died: cch+6CEM
01-31 19:03:04.159 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:04.624 1177 1197 W zygote64: kill(-5283, 9) failed: No such process
01-31 19:03:04.625 1177 1197 I zygote64: Successfully killed process cgroup uid 10051 pid 5283 in 0ms
01-31 19:03:04.625 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:04.631 1177 1556 I ActivityManager: Process android.process.acore (pid 5263) has died: cch+6CEM
01-31 19:03:04.631 1177 1197 W zygote64: kill(-5263, 9) failed: No such process
01-31 19:03:04.631 1177 1197 I zygote64: Successfully killed process cgroup uid 10026 pid 5263 in 0ms
01-31 19:03:04.680 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:05.288 1177 1556 I ActivityManager: Process com.android.mtp (pid 5995) has died: cch+4CEM
01-31 19:03:05.288 1177 1197 W zygote64: kill(-5995, 9) failed: No such process
01-31 19:03:05.288 1177 1197 I zygote64: Successfully killed process cgroup uid 10003 pid 5995 in 0ms
01-31 19:03:05.289 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:05.289 828 828 E lowmemorykiller: Error writing /proc/3031/oom_score_adj; errno=22
01-31 19:03:05.289 1177 1556 I ActivityManager: Process android.process.media (pid 6036) has died: cch+4CEM
01-31 19:03:05.290 1177 1197 W zygote64: kill(-6036, 9) failed: No such process
01-31 19:03:05.290 1177 1197 I zygote64: Successfully killed process cgroup uid 10003 pid 6036 in 0ms
01-31 19:03:05.298 1177 1498 D ConnectivityService: ConnectivityService NetworkRequestInfo binderDied(NetworkRequest [ LISTEN id=12, [ Capabilities: INTERNET&NOT_RESTRICTED&TRUSTED] ], android.os.BinderProxy@72d45a4)
01-31 19:03:05.298 1177 1188 I ActivityManager: Process com.android.vending (pid 3031) has died: cch+6CEM
01-31 19:03:05.299 1177 1197 W zygote64: kill(-3031, 9) failed: No such process
01-31 19:03:05.299 1177 1197 I zygote64: Successfully killed process cgroup uid 10006 pid 3031 in 0ms
01-31 19:03:05.351 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:05.645 6618 6715 W google-breakpad: ExceptionHandler::GenerateDump cloned child
01-31 19:03:05.645 6876 6715 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
01-31 19:03:05.645 6618 6715 W google-breakpad: 6876
01-31 19:03:05.645 6618 6715 W google-breakpad:
01-31 19:03:05.645 6618 6715 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
01-31 19:03:05.685 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:05.950 806 806 I chatty : uid=1000(system) power@1.1-servi identical 1 line
01-31 19:03:06.019 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:06.126 6657 6729 I Gecko : Exiting due to channel error.
01-31 19:03:06.137 1177 2351 I ActivityManager: Process org.mozilla.geckoview_example:socket (pid 6679) has died: fore TOP
01-31 19:03:06.138 1177 2351 W ActivityManager: Scheduling restart of crashed service org.mozilla.geckoview_example/org.mozilla.gecko.process.GeckoServiceChildProcess$socket in 1000ms
01-31 19:03:06.138 1177 1197 W zygote64: kill(-6679, 9) failed: No such process
01-31 19:03:06.138 1177 1197 I zygote64: Successfully killed process cgroup uid 10120 pid 6679 in 0ms
01-31 19:03:06.140 806 806 E android.hardware.power@1.1-service.wahoo: Governor not supported by powerHAL, skipping
01-31 19:03:06.146 1177 1584 I ActivityManager: Process org.mozilla.geckoview_example:tab (pid 6657) has died: fore TOP
01-31 19:03:06.146 1177 1197 W zygote64: kill(-6657, 9) failed: No such process
01-31 19:03:06.146 1177 1584 W ActivityManager: Scheduling restart of crashed service org.mozilla.geckoview_example/org.mozilla.gecko.process.GeckoServiceChildProcess$tab in 10991ms
01-31 19:03:06.147 1177 1286 W InputDispatcher: channel '3dc7eb9 org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity (server)' ~ Consumer closed input channel or an error occurred. events=0x9
01-31 19:03:06.148 1177 1586 I ActivityManager: Process org.mozilla.geckoview_example (pid 6618) has died: fore TOP
01-31 19:03:06.148 1177 1497 I WindowManager: WIN DEATH: Window{3dc7eb9 u0 org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity}
01-31 19:03:06.148 1177 1586 W ActivityManager: Force removing ActivityRecord{82e35aa u0 org.mozilla.geckoview_example/.GeckoViewActivity t5806}: app died, no saved state
01-31 19:03:06.149 1177 1286 E InputDispatcher: channel '3dc7eb9 org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity (server)' ~ Channel is unrecoverably broken and will be disposed!
01-31 19:03:06.149 1177 1497 W InputDispatcher: Attempted to unregister already unregistered input channel '3dc7eb9 org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity (server)'
01-31 19:03:06.152 738 738 I Zygote : Process 6618 exited due to signal (11)
note that we are getting some kind of minidump written:
01-31 19:03:05.645 6618 6715 W google-breakpad: ExceptionHandler::GenerateDump cloned child
01-31 19:03:05.645 6876 6715 W google-breakpad: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
01-31 19:03:05.645 6618 6715 W google-breakpad: 6876
01-31 19:03:05.645 6618 6715 W google-breakpad:
01-31 19:03:05.645 6618 6715 W google-breakpad: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
I would very much like somebody to dig into these minidumps and ensure that we really are getting OOM and not some actual crash in Gecko!
Comment 7•5 years ago
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #5)
An update on this: locally, I have geckodriver pulling minidumps off the device. I still need to wire up the
mozcrash
pieces, but that looks almost trivial once the minidumps are captured.
Hey Nick. Do you have any kind of code here you could share?
I would like to see crash handling support in geckodriver sooner than later. Getting the minidump out of the profile that geckodriver deletes immediatily after the test finished is hard. For a quick workaround I filed bug 1613064, but on the longer term bug 1433495 would be good to have (with support on desktop and mobile).
Updated•5 years ago
|
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #7)
(In reply to Nick Alexander :nalexander [he/him] from comment #5)
An update on this: locally, I have geckodriver pulling minidumps off the device. I still need to wire up the
mozcrash
pieces, but that looks almost trivial once the minidumps are captured.Hey Nick. Do you have any kind of code here you could share?
Yes.
I would like to see crash handling support in geckodriver sooner than later. Getting the minidump out of the profile that geckodriver deletes immediatily after the test finished is hard. For a quick workaround I filed bug 1613064, but on the longer term bug 1433495 would be good to have (with support on desktop and mobile).
I've posted a patch series (some of which has r?whimboo, although even those bits aren't quite ready for review). That series started with (ato's?) existing work on respecting MINIDUMP_SAVE_PATH
. (For the record, I think using this env var is janky for external consumers; an argument or other configuration mechanism would be better.). Then I built adb pull
support for Android. I'm aware of one serious issue with that functionality: we need to avoid using Path
for on-device paths to avoid tickets like Bug 1609263. (Might as well update adb push
while you're in the neighbourhood: Bug 1610629.) Finally, I started to think about testing, including automated testing, and concluded that it's probably possible but it's tricky because the Raptor + Browsertime harness just isn't very flexible. (Crash test handling looks like functional testing, not performance testing, so there's an impedance mismatch.). The last mile would be to wire up and ensure that the mozcrash
handling works for the Raptor + Browsertime harness variants; once the minidumps are being pulled that should be fairly straight-forward.
Feel free to commandeer everything. Or, if you prefer, we can break out the adb pull
patches into a separate ticket, I can address review comments, and then you can pick up the geckodriver + Raptor + Browsertime bits. Good luck!
Oh crumbs, I put the patch series against Bug 1433495. Oh well, most of it is actually for that functionality and not the Raptor + Browsertime bits.
Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 11•1 year ago
•
|
||
This can now be fixed by setting the MINIDUMP_SAVE_PATH
environment variable to a folder that is used for artifact upload in CI. You can have a look at the wpt test jobs which work fine for that.
Comment 12•1 year ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #11)
This can now be fixed by setting the
MINIDUMP_SAVE_PATH
environment variable to a folder that is used for artifact upload in CI. You can have a look at the wpt test jobs which work fine for that.
That's great to know, thanks @whimboo!
@andrej, you had expressed interest in working on this before. As this is a valuable feature missing since we moved to browsertime, I have assigned it to you. Perhaps we could schedule it for an upcoming "sprint"?
Updated•1 year ago
|
Comment 13•1 year ago
•
|
||
:whimboo, I spent a bit of time looking into this today, and I haven't been able to get crashes saved/moved to the correct location. I've set the MINIDUMP_SAVE_PATH
environment variable in multiple locations, but none of them seem to have an effect on where the crash dump file gets moved too or saved in. I am seeing the MINIDUMP_SAVE_PATH in the capabilities in a geckodriver log though (it's also set in the process that the browser is started from):
17:22:43 INFO - raptor-browsertime Info: 1708708963430 webdriver::server DEBUG -> POST /session {"capabilities":{"firstMatch":[{}],"alwaysMatch":{"browserName":"firefox","pageLoadStrategy":"none","moz:fir ... linux-gnu/dist/bin/firefox","args":["-no-remote"],"env":{"MINIDUMP_SAVE_PATH":"/home/sparky/crashes"}},"webSocketUrl":true}}}
I see the minidumps being produced in the temporary profile folder that geckodriver produces, but the folder gets deleted before the minidumps get moved or saved elsewhere.
Do you know what I might be missing here?
Comment 14•1 year ago
|
||
If you are trying to do that for desktop you cannot use the env
capability (this way is only supported on Android atm and we might even remove it). Instead you will have to set the environment variable when calling browsertime internally.
Comment 15•1 year ago
|
||
I tried that as well by setting process.env.MINIDUMP_SAVE_PATH = "/home/sparky/crashes";
in this area of the browsertime firefox builder, and none of the minidumps were copied/moved over (I've been testing with about:crashparent
, and about:crashcontent
): https://github.com/sitespeedio/browsertime/blob/a4dfa25fa21b43d6a287043a0bb5bffe1ded1768/lib/firefox/webdriver/builder.js#L245
I double checked to see if the env variables being set there are being found by firefox by setting --firefox.collectMozLog
since it modifies the same process.env
to get mozlogs collected, and it works fine. Are there any additional flags I might have to pass, or is there a flag that might be preventing this to work?
Comment 16•1 year ago
|
||
That is strange. Do you actually have a full trace log for such a case?
There shouldn't any other additional flags or environment variables be needed. Note that this is purely a Firefox feature and if Firefox has this env variable set you might better want to talk to Gabriele Svelto about it.
Comment 17•1 year ago
|
||
Yup, I've attached the log of a test run. In this one I navigated to about:crashparent
.
Comment 18•1 year ago
|
||
:gsvelto, do you know what might be happening here? We're trying to set the MINIDUMP_SAVE_PATH
variable to save the minidumps from browsertime test runs (uses selenium), but it doesn't seem to have an effect.
As I mentioned in comment #15 above, we're able to get environment variables set for the firefox process (tested with mozlogs). I checked to see if this was a local issue as well by running about:crashparent
in a talos test, and I'm able to get the crash generated there.
I've attached a file with logs from geckodriver, but let me know if you need any additional logs/info.
Updated•1 year ago
|
Comment 19•1 year ago
|
||
:gsvelto, see comment #18.
Comment 20•1 year ago
|
||
I checked the log file and I cannot actually see a navigation to about:crashparent
but about:blank
. So not sure which action actually triggered the crash. Does the same also happen for about:crashcontent
? In such a case Marionette is shutting down the browser.
Comment 21•1 year ago
•
|
||
Ah sorry, I should have mentioned that I navigated to about:crashparent
through the URL bar. In this log, I changed the about:blank
navigation to about:crashcontent
. The logs are a bit different with about:crashcontent
, but the minidumps don't get moved/copied to the MINIDUMP_SAVE_PATH folder either.
Comment 22•1 year ago
|
||
I'm not familiar with how geckodriver works, but MINIDUMP_SAVE_PATH
is honored by mozcrash.py if no dump_save_path
argument has been passed. Is it possible that mozcrash is being invoked with a different dump_save_path
argument? If that's the case then MINIDUMP_SAVE_PATH
will be ignored (see here).
Comment 23•1 year ago
|
||
Oh interesting, thanks for the info! I can't get to that point because the temporary profile that's generated, and contains the minidumps is deleted by the time I have a chance to call check_for_crashes
in the raptor-browsertime code. The browser gets started, and stopped in a node.js process that's separate from the code that would call mozcrash.
It sounds like I'd need to catch the crash within browsertime, and move the minidumps there, or something else along those lines.
Comment 24•1 year ago
|
||
Note that MINIDUMP_SAVE_PATH
exists in combination with geckodriver to exactly not be affected by the profile removal. geckodriver does not have any internals like what mozcrash has, so minidumps need to be checked independently. As such use a folder for MINIDUMP_SAVE_PATH
which clearly is created by Raptor, and can then be checked and deleted as well.
Comment 25•1 year ago
|
||
I was able to get it working for some cases in browsertime. Afaict, the reason wpt captures them when MINIDUMP_SAVE_PATH is used, is because it starts/runs the firefox process through mozrunner which has logic for capturing the minidumps. One thing I'm wondering is if the documentation here is still correct, I tried running a basic test similar to what's documented there where I open a browser, and crash it but the minidumps don't get saved: https://firefox-source-docs.mozilla.org/testing/geckodriver/CrashReports.html
For browsertime, I had to add some code here to pull the minidumps into the save path right before we quit the driver: https://github.com/sitespeedio/browsertime/blob/0af4e411159185ea7ec5f258564df54ad1085d24/lib/core/seleniumRunner.js#L527
However, an issue I'm hitting now for the majority of cases is that when the browser crashes, browsertime might be waiting to check if the page load completed. The result of this is that the webdriver tears down the session before we can check for minidumps. Here's an example output where the browser crashes, then a page complete check that's done from a driver.wait
call triggers the teardown:
# navigated to about:crashcontent
13:30:41 INFO - raptor-browsertime Info: 1709040641863 Marionette TRACE [14] MarionetteCommands actor destroyed for window id 30
13:30:41 INFO - raptor-browsertime Info: 1709040641864 webdriver::server DEBUG <- 200 OK {"value":null}
13:30:41 INFO - raptor-browsertime Info: 1709040641870 Marionette DEBUG Closed connection 0
13:30:41 INFO - raptor-browsertime Info: console.error: ({})
# browser is crashed now, then the next line is the page load complete check being done in the driver.wait method which destroys the temp profile folder
13:30:42 INFO - raptor-browsertime Info: 1709040642867 webdriver::server DEBUG -> POST /session/88a8f5af-1fc6-473b-97eb-ab89a906527a/execute/sync {"script":"\nreturn (function(waitTime) {\n try { \n var end = window.performance.timing.loadEventEnd;\n var start= window.performance.timing.navigationStart;\n return (end > 0) && (performance.now() > end - start + waitTime);\n } \n catch(e) {\n return true;\n }\n})(arguments[arguments.length - 1]);\n","args":[5000]}
13:30:42 INFO - raptor-browsertime Info: 1709040642867 webdriver::server DEBUG Teardown session
13:30:42 INFO - raptor-browsertime Info: 1709040642876 geckodriver::marionette ERROR Failed to close browser connection: Socket not connected (os error 107)
# temp profile directory with minidumps is now deleted
Here's the line where this happens at, but the teardown gets triggered on any driver calls from what I noticed: https://github.com/sitespeedio/browsertime/blob/0af4e411159185ea7ec5f258564df54ad1085d24/lib/core/seleniumRunner.js#L189-L194
:whimboo, would you have any ideas about how I could get around this or know of some way to prevent the session teardown? Alternatively, I'm thinking of finding a way to check if the browser process is still alive, and prevent driver calls when it's not.
Comment 26•1 year ago
|
||
If the minidump files have not been saved then I assume that MINIDUMP_SAVE_PATH
is not correctly set for the Firefox process. Once a crash happened the crash reporter should automatically save the minidump in that folder and not within the minidumps
subfolder of the used Firefox profile.
When the browser crashes we should return the exit code of the browser, which could be used to determine if the (unexpected) shutdown was a normal exit or not. The exit code of a content process crash will be 245 (since my fix on bug 1370520 landed).
Comment 27•1 year ago
|
||
Can you point me to the code that does this? Maybe that will help me debug this further. I might be missing something here, but based on :gsvelto's comment, it sounds like it's mozcrash that does the saving/capturing, and not something within Firefox.
Thanks for the tip regarding the exit code, I'll see if I can use that to move forward.
Comment 28•1 year ago
|
||
Oh no. Now I see. I completely missed that. I was really thinking that the crash reporter could save the minidumps directly into that folder. I filed bug 1882338 to really implement it in geckodriver. Sorry again!
Gabriele, would it be possible to instruct the crash reporter (maybe via such an environment variable) to not use the default minidump folder? I'm happy to file a bug if wanted and possible.
Comment 29•1 year ago
|
||
No worries, thanks for taking a look! Glad we figured it out :)
Comment 30•1 year ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #28)
Gabriele, would it be possible to instruct the crash reporter (maybe via such an environment variable) to not use the default minidump folder? I'm happy to file a bug if wanted and possible.
If you can access JS we already have the minidumpPath
property which can be used to set the path where minidumps are written. See how we do it for xpcshell tests.
Comment 31•1 year ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #30)
If you can access JS we already have the
minidumpPath
property which can be used to set the path where minidumps are written. See how we do it for xpcshell tests.
Oh that is neat! I didn't know about that. It's neat and would make it especially much easier for us to pull the minidump files from a remote device by not having to take care of the actual profile cleanup. Thanks!
Comment 32•1 year ago
|
||
Hi Gabriele, thanks for the pointers and we are looking at solutions using JS to set minidumpPath
.
But in case of a very early crash, there is still a risk that we might not set minidumpPath
in time. Do you think we could add a preference to set the default minidump path? This way geckodriver could set this preference in the profile before starting the browser.
Comment 33•1 year ago
|
||
It would be rather easy. When we set the exception handler we use a temporary location for putting the minidumps which is later changed to the final location. We could set that to the contents of an environment variable, but not a preference, it's too early in the startup process for that.
That being said the temporary path is stored in minidumpPath
, so the harness could read it and check for crashes under that path before setting it to the desired location.
Comment 34•11 months ago
|
||
Please note that I've moved the recent conversation to bug 1884695 comment 1, where it's better suited. It's not really related to raptor (at least at the discussed level).
Description
•