Closed Bug 1351166 Opened 7 years ago Closed 6 years ago

Intermittent tscrollx, tp5o_scroll, tp6_google, tsvgx, tp5n, tp6_facebook_heavy, tp6_amazon_heavy, tp6_google_heavy, tp5o_webext, tpaint | Found crashes after test run, terminating test

Categories

(Testing :: Talos, defect)

Version 3
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

with the exception of a 2 hour period (yesterday) this has no instances in the last 2 weeks; the failures yesterday were actually fallout from a bad patch that we backed out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Summary: Intermittent tscrollx | Found crashes after test run, terminating test → Intermittent tscrollx, tp5o_scroll, tp6_google, tsvgx, tp5n, tp6_facebook_heavy, tp6_amazon_heavy | Found crashes after test run, terminating test
Summary: Intermittent tscrollx, tp5o_scroll, tp6_google, tsvgx, tp5n, tp6_facebook_heavy, tp6_amazon_heavy | Found crashes after test run, terminating test → Intermittent tscrollx, tp5o_scroll, tp6_google, tsvgx, tp5n, tp6_facebook_heavy, tp6_amazon_heavy, tp6_google_heavy, tp5o_webext, tpaint | Found crashes after test run, terminating test
In the last 7 days, there have been 33 failures.

Most of the failures are on windows10-64, some of them are on Windows 7 and a few on OS X 10.10 and Linux x64.
Affected build types: pgo and opt

An example of a recent log file:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=152521027&lineNumber=3674

And the relevant part of the log:
19:24:27     INFO -  PID 844 | Unable to read VR Path Registry from C:\Users\cltbld\AppData\Local\openvr\openvrpaths.vrpath
19:24:27     INFO -  PID 844 | [Parent 844, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | Unable to read VR Path Registry from C:\Users\cltbld\AppData\Local\openvr\openvrpaths.vrpath
19:24:27     INFO -  PID 844 | [Child 7476, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | [Child 7476, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | [Parent 844, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | [Child 488, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | [Child 488, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | Unable to read VR Path Registry from C:\Users\cltbld\AppData\Local\openvr\openvrpaths.vrpath
19:24:27     INFO -  PID 844 | [Child 7376, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | [Child 7376, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:27     INFO -  PID 844 | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
19:24:28     INFO -  PID 844 | Unable to read VR Path Registry from C:\Users\cltbld\AppData\Local\openvr\openvrpaths.vrpath
19:24:28     INFO -  PID 844 | [GPU 9172, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:24:28     INFO -  PID 844 |
19:24:28     INFO -  PID 844 | ###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv
19:24:28     INFO -  PID 844 |
19:24:29     INFO -  TEST-INFO | 844: exit 0
19:24:29     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/PPsfF3LHSJi5jVrv5INpsg/artifacts/public/build/target.crashreporter-symbols.zip
19:24:32     INFO -  mozcrash Copy/paste: C:\slave\test\build\win32-minidump_stackwalk.exe c:\users\cltbld\appdata\local\temp\tmpdl0jal\profile\minidumps\6e2fce43-5cf5-478f-b4e1-f1143e359bef-browser.dmp c:\users\cltbld\appdata\local\temp\tmpwb1seg
19:24:37     INFO -  mozcrash Saved minidump as C:\slave\test\build\blobber_upload_dir\6e2fce43-5cf5-478f-b4e1-f1143e359bef-browser.dmp
19:24:37     INFO -  PROCESS-CRASH | tpaint | application crashed [@ google_breakpad::ExceptionHandler::WriteMinidump()]
19:24:37     INFO -  Crash dump filename: c:\users\cltbld\appdata\local\temp\tmpdl0jal\profile\minidumps\6e2fce43-5cf5-478f-b4e1-f1143e359bef-browser.dmp
19:24:37     INFO -  Operating system: Windows NT
19:24:37     INFO -                    10.0.15063
19:24:37     INFO -  CPU: amd64
19:24:37     INFO -       family 6 model 30 stepping 5
19:24:37     INFO -       8 CPUs
19:24:37     INFO -  GPU: UNKNOWN
19:24:37     INFO -  Crash reason:  EXCEPTION_NONCONTINUABLE_EXCEPTION
19:24:37     INFO -  Crash address: 0x0
19:24:37     INFO -  Process uptime: 11 seconds
19:24:37     INFO -  Thread 0 (crashed)
19:24:37     INFO -   0  xul.dll!google_breakpad::ExceptionHandler::WriteMinidump() [exception_handler.cc:39f70372b66c : 740 + 0x0]
19:24:37     INFO -      rax = 0x00000073835faa70   rdx = 0x0000000000000000
19:24:37     INFO -      rcx = 0x00000073835faa70   rbx = 0x00000073835fafb0
19:24:37     INFO -      rsi = 0x00000073835fb4c0   rdi = 0x0000000000000000
19:24:37     INFO -      rbp = 0x00000073835fb230   rsp = 0x00000073835fa9a0
19:24:37     INFO -       r8 = 0x0000000000000000    r9 = 0x0000000000000000
19:24:37     INFO -      r10 = 0x0000000000000040   r11 = 0x00000073835faa70
19:24:37     INFO -      r12 = 0x0000000000000000   r13 = 0x00000073835fb4c0
19:24:37     INFO -      r14 = 0x0000000000000000   r15 = 0x0000000000000aac
19:24:37     INFO -      rip = 0x00007ffa6937fe7a
19:24:37     INFO -      Found by: given as instruction pointer in context


:rwood As you are the triage owner of this component, could you please take a look at this?
Thank you!
Flags: needinfo?(rwood)
Whiteboard: [stockwell needswork]
:jimm, this looks like a legit crash that is happening in tpaint (and other places); would you be able to please have a look (or forward it on to someone whom could)? Thanks!
Flags: needinfo?(rwood) → needinfo?(jmathies)
keep in mind the crash is here:
@ google_breakpad::ExceptionHandler::WriteMinidump()

possibly we have a bad format for the minidump file or need more symbols/data/libraries?

:ted, do you have advice here?
Flags: needinfo?(jmathies) → needinfo?(ted)
If you scroll down in the log in comment 20 you'll see:
19:24:37     INFO -   2  xul.dll!CrashReporter::CreateMinidumpsAndPair(void *,unsigned long,nsTSubstring<char> const &,nsIFile *,nsIFile * *,std::function<void > &&,bool) [nsExceptionHandler.cpp:39f70372b66c : 3963 + 0x16]

This stack is from the browser process writing a minidump of itself because the content process timed out and it is going to kill it.

If you scroll down to the next PROCESS-CRASH line you'll see:
19:24:42     INFO -  PROCESS-CRASH | tpaint | application crashed [@ AslHashFree + 0x32348]

which is the child process. That signature looks kind of bogus, and the stack is weird, but it looks like it's probably running JavaScript code.
Flags: needinfo?(ted)
Sorry, should have elaborated on that last bit. In the content process stack, everything below:
19:24:42     INFO -  11  xul.dll!nsAppShell::ProcessNextNativeEvent(bool) [nsAppShell.cpp:39f70372b66c : 473 + 0x8]

is just "spinning the event loop". Up a few frames from that is:
19:24:42     INFO -   9  xul.dll!mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint() [CycleCollectedJSContext.cpp:39f70372b66c : 530 + 0xc]

Which seems to indicate there's JS running. Sadly we don't currently have a way to get the JS stack when we crash in any way.
This bug has failed 37 times in the last 7 days on OS X 10.10, Windows 10 and Windows 7 affecting opt and pgo build types.
Failing tests: opt-talos-h2-e10s, talos-tp6-stylo-threads-e10.

Recent log link: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=159969461&lineNumber=1483

Part of that log: 15:40:58     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/amjYJBfkSWS4wy2sRuf9nA/artifacts/public/build/target.crashreporter-symbols.zip
15:41:06     INFO -  mozcrash Copy/paste: /Users/cltbld/tasks/task_1517517009/build/macosx64-minidump_stackwalk /var/folders/78/_z31zj_d0pj4lr8c3t_wcnkw00000w/T/tmpwlvFD4/profile/minidumps/556663C9-A70E-4B1F-B204-1E931068F106.dmp /var/folders/78/_z31zj_d0pj4lr8c3t_wcnkw00000w/T/tmpeFUGdZ
15:41:06     INFO -  mozcrash Saved minidump as /Users/cltbld/tasks/task_1517517009/build/blobber_upload_dir/556663C9-A70E-4B1F-B204-1E931068F106.dmp
15:41:09     INFO -  PROCESS-CRASH | tp6_google_heavy | application crashed [unknown top frame]
15:41:09     INFO -  Crash dump filename: /var/folders/78/_z31zj_d0pj4lr8c3t_wcnkw00000w/T/tmpwlvFD4/profile/minidumps/556663C9-A70E-4B1F-B204-1E931068F106.dmp
 
ni: rwood,ted.mielczarek do you guys have any updates on this bug?
Flags: needinfo?(ted)
Flags: needinfo?(rwood)
Whiteboard: [stockwell unknown] → [stockwell needswork]
as a note the talos-h2 jobs are off on osx and windows right now- but we still have tp6 crashes- although fewer- ideally we can figure out if these are crashes, or timeouts and split that up more logically.
(In reply to Arthur Iakab [arthur_iakab] from comment #31)
> 15:41:06     INFO -  mozcrash Saved minidump as
> /Users/cltbld/tasks/task_1517517009/build/blobber_upload_dir/556663C9-A70E-
> 4B1F-B204-1E931068F106.dmp
> 15:41:09     INFO -  PROCESS-CRASH | tp6_google_heavy | application crashed
> [unknown top frame]

This minidump is zero bytes, which means we failed to write a minidump for some reason. This may or may not be the same issue observed previously.
Flags: needinfo?(ted)
The tp6_google failure looks to be the same as Bug 1439979
Flags: needinfo?(rwood)
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → INCOMPLETE
New failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=181598206&repo=autoland&lineNumber=873
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.