Intermittent ts_paint_webext/ts_paint | application crashed [unknown top frame]
Categories
(Testing :: Talos, defect, P3)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell needswork:owner])
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 4•8 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 6•8 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 8•7 years ago
|
||
![]() |
||
Comment 9•7 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 12•7 years ago
|
||
![]() |
||
Comment 13•7 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 18•7 years ago
|
||
Comment 19•7 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 27•7 years ago
|
||
![]() |
||
Comment 28•6 years ago
|
||
New occurrence:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=289863882&repo=autoland&lineNumber=2184
Comment hidden (Intermittent Failures Robot) |
Comment 30•6 years ago
|
||
Gabrielle, is that the same issue again as filed as bug 1614931?
Comment 31•6 years ago
|
||
No, this is worse. The only minidump I found is completely empty. There is no .extra file either. Somehow we failed in a way that was so catastrophic that we didn't write anything out.
Comment hidden (Intermittent Failures Robot) |
Comment 34•6 years ago
|
||
![]() |
||
Comment 35•5 years ago
|
||
New occurrence:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305181390&repo=mozilla-central&lineNumber=1242
Comment hidden (Intermittent Failures Robot) |
Comment 37•5 years ago
|
||
In the past week there were 27 failures on windows10-64-shippable opt.
Recent log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=306259778&repo=mozilla-central&lineNumber=966
[task 2020-06-13T23:34:00.722Z] 23:34:00 INFO - TEST-INFO | started process 708 (C:\Users\task_1592081630\build\application\firefox\firefox -wait-for-browser -no-deelevate -profile c:\users\task_1592081630\appdata\local\temp\tmpei5iw6\profile http://127.0.0.1:49796/startup_test/tspaint_test.html?gecko_profile_entries=10000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=c%3A%5Cusers%5Ctask_1592081630%5Cappdata%5Clocal%5Ctemp%5Ctmpq3er2o&gecko_profile_interval=1 http://127.0.0.1:49796/startup_test/tspaint_test.html?gecko_profile_entries=10000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=c%3A%5Cusers%5Ctask_1592081630%5Cappdata%5Clocal%5Ctemp%5Ctmpq3er2o&gecko_profile_interval=1)
[task 2020-06-13T23:34:02.377Z] 23:34:02 INFO - PID 708 | __start_report396__end_report
[task 2020-06-13T23:34:02.377Z] 23:34:02 INFO - PID 708 |
[task 2020-06-13T23:34:02.379Z] 23:34:02 INFO - PID 708 | __startTimestamp1592091242377__endTimestamp
[task 2020-06-13T23:34:10.390Z] 23:34:10 INFO - Browser shutdown timed out after 5 seconds, killing process.
[task 2020-06-13T23:34:10.392Z] 23:34:10 INFO - Launcher process psutil.Process(pid=708L, name='firefox.exe', started='23:34:00') detected. Killing parent process psutil.Process(pid=1732, name='firefox.exe', started='23:34:00') instead.
[task 2020-06-13T23:34:10.392Z] 23:34:10 INFO - Killing psutil.Process(pid=1732, name='firefox.exe', started='23:34:00') and writing a minidump file
[task 2020-06-13T23:34:10.392Z] 23:34:10 INFO - mozcrash Writing a dump to c:\users\task_1592081630\appdata\local\temp\tmpei5iw6\profile\minidumps\421ddf84-4036-48c9-8871-9ba97819b5b8.dmp for [1732]
[task 2020-06-13T23:34:10.470Z] 23:34:10 INFO - PID 708 | Exiting due to channel error.
[task 2020-06-13T23:34:10.515Z] 23:34:10 INFO - mozcrash checking c:\users\task_1592081630\appdata\local\temp\tmpei5iw6\profile\minidumps for minidumps...
[task 2020-06-13T23:34:10.515Z] 23:34:10 INFO - mozcrash Downloading symbols from: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/P_lm41ksRCGdvUqP331FwQ/artifacts/public/build/target.crashreporter-symbols.zip
[task 2020-06-13T23:34:11.963Z] 23:34:11 INFO - mozcrash Copy/paste: C:/Users/task_1592081630/fetches\minidump_stackwalk\minidump_stackwalk.exe c:\users\task_1592081630\appdata\local\temp\tmpei5iw6\profile\minidumps\421ddf84-4036-48c9-8871-9ba97819b5b8.dmp c:\users\task_1592081630\appdata\local\temp\tmp1bhlxm
[task 2020-06-13T23:34:16.838Z] 23:34:16 INFO - mozcrash Saved minidump as C:\Users\task_1592081630\build\blobber_upload_dir\421ddf84-4036-48c9-8871-9ba97819b5b8.dmp
[task 2020-06-13T23:34:16.838Z] 23:34:16 INFO - PROCESS-CRASH | ts_paint_webext | application crashed [unknown top frame]
Comment hidden (Intermittent Failures Robot) |
Comment 40•5 years ago
|
||
See :gsvelto's comment here: https://bugzilla.mozilla.org/show_bug.cgi?id=981641#c20
:gbrown would you be able to look into this issue? It sounds like it's coming from mozcrash based on :gsvelto's comment.
Comment 41•5 years ago
•
|
||
FYI I've pulled down a few minidumps and had a look:
- All of them were of the parent process
- All of them had different stacks, so the parent process was killed at a different point every time
- Looking at the logs it seems that the test never started, have a look at this snippet:
[task 2020-06-11T11:17:02.484Z] 11:17:02 INFO - TEST-INFO | started process 8188 (C:\Users\task_1591864905\build\application\firefox\firefox -wait-for-browser -no-deelevate -profile c:\users\task_1591864905\appdata\local\temp\tmpbplloy\profile http://127.0.0.1:49799/startup_test/tspaint_test.html?gecko_profile_entries=10000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=c%3A%5Cusers%5Ctask_1591864905%5Cappdata%5Clocal%5Ctemp%5Ctmpetgzbj&gecko_profile_interval=1 http://127.0.0.1:49799/startup_test/tspaint_test.html?gecko_profile_entries=10000000&gecko_profile_threads=GeckoMain%2CCompositor&gecko_profile_dir=c%3A%5Cusers%5Ctask_1591864905%5Cappdata%5Clocal%5Ctemp%5Ctmpetgzbj&gecko_profile_interval=1)
[task 2020-06-11T11:17:08.023Z] 11:17:08 INFO - PID 8188 | BROWSER FAILED TO GENERATE MOZAFTERPAINT IN 5 SECONDS
[task 2020-06-11T11:17:08.023Z] 11:17:08 INFO - PID 8188 | __startTimestamp1591874228020__endTimestamp
[task 2020-06-11T11:17:16.069Z] 11:17:16 INFO - Browser shutdown timed out after 5 seconds, killing process.
Connecting the dots above it seems that the main process might have been slow to start and the harness killed it after 5 seconds w/o a first paint event. Not knowing how the harness works my guess is that the main process might not be stuck, just very very slow starting up. It might be interesting to bump that timeout and see if the issue goes away. As to why the main process is taking so long to start up I have no idea unfortunately.
![]() |
||
Comment 42•5 years ago
|
||
At a glance, I'm not sure there is anything wrong with mozcrash, or the crash reports produced here. Most of these failures are timeouts and mozcrash forces minidump generation and kills firefox. Of course there is no crashing thread in that scenario, and there may be more threads just running in OS code...maybe that is causing some confusion? Looking through a few crash reports, most seem to have some threads running firefox code, and the stacks are symbolicated and appear consistent...if not very helpful. Unhelpful stacks may also be related to the scenario called out in comment 41: if firefox is still starting up, hits a timeout and is shutting down when the minidump is generated, the crash report may not be very interesting.
![]() |
||
Comment 43•5 years ago
|
||
To follow up on comment 41, I pushed to try with longer timeouts, concentrating on the Windows 10 T(prof) tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=334430599e41c198ddad4289f79a89285216b076
Increasing wait_for_quit_timeout from 5 to 30 seconds seems to avoid many of the crash-on-shutdown-timeout occurrences.
Increasing the MozAfterPaint timeout from 5 to 120 seconds does not seem to help at all: there are still frequent timeouts after 120 seconds.
Even with both changes, these tasks remain near perma-fail.
Comment 44•5 years ago
|
||
Maybe push another try with toolkit.asyncshutdown.log
enabled. It might give us some ideas what's hanging during startup/shutdown.
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 46•5 years ago
|
||
There are 37 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-06-19&endday=2020-06-26&tree=trunk&bug=1439522
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307623880&repo=mozilla-central&lineNumber=2087
Comment hidden (Intermittent Failures Robot) |
Comment 48•5 years ago
|
||
As it looks like with the fix on bug 1630448 the number of crashes have been reduced a lot or are even gone. There is only one instance on June 26th. It shows a problem with minidump stackwalk:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307623880&repo=mozilla-central
[task 2020-06-26T04:57:37.276Z] 04:57:37 INFO - 2020-06-26 04:57:37: minidump_stackwalk.cc:135: ERROR: Minidump c:\users\task_1593134568\appdata\local\temp\tmpm79dej\profile\minidumps\eb95f136-02c9-4301-abda-2fe6ef2a28d8.dmp could not be read
[task 2020-06-26T04:57:37.276Z] 04:57:37 INFO - minidump_stackwalk exited with return code 1
The minidump is only 64 bytes in size. Looks like was a problem in saving it? Gabriele, any idea?
I would say we close this bug.
Comment 49•5 years ago
|
||
The stackwalker couldn't analyze the minidump because it's empty. That's interesting because when this happens we often attributed it to an issue in Gecko, but this is a minidump that was generated by mozcrash. mozcrash just calls the appropriate Windows API which means that the failure to write the minidump didn't happen in our code... which is encouraging. Either way yeah, I think we can close this bug.
Updated•5 years ago
|
Description
•