Closed Bug 1630448 Opened 5 years ago Closed 5 years ago

Intermittent PROCESS-CRASH | <talos> | application crashed [unknown top frame]

Categories

(Core :: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(2 files)

Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=297670097&repo=mozilla-beta
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TDxBFZw1SvObFYncAfHsOQ/runs/0/artifacts/public/logs/live_backing.log


[task 2020-04-15T05:41:46.485Z] 05:41:46 INFO - mozcrash Copy/paste: C:/Users/task_1586921765/fetches\minidump_stackwalk\minidump_stackwalk.exe c:\users\task_1586921765\appdata\local\temp\tmphkqtzu\profile\minidumps\356ec897-dfc0-4382-b2d9-9713a2ac2c0d.dmp c:\users\task_1586921765\appdata\local\temp\tmprku8_i
[task 2020-04-15T05:41:52.149Z] 05:41:52 INFO - mozcrash Saved minidump as C:\Users\task_1586921765\build\blobber_upload_dir\356ec897-dfc0-4382-b2d9-9713a2ac2c0d.dmp
[task 2020-04-15T05:41:52.149Z] 05:41:52 INFO - PROCESS-CRASH | damp | application crashed [unknown top frame]
[task 2020-04-15T05:41:52.150Z] 05:41:52 INFO - Crash dump filename: c:\users\task_1586921765\appdata\local\temp\tmphkqtzu\profile\minidumps\356ec897-dfc0-4382-b2d9-9713a2ac2c0d.dmp
[task 2020-04-15T05:41:52.150Z] 05:41:52 INFO - Operating system: Windows NT
[task 2020-04-15T05:41:52.150Z] 05:41:52 INFO - 10.0.15063
[task 2020-04-15T05:41:52.150Z] 05:41:52 INFO - CPU: amd64
[task 2020-04-15T05:41:52.151Z] 05:41:52 INFO - family 6 model 94 stepping 3
[task 2020-04-15T05:41:52.151Z] 05:41:52 INFO - 8 CPUs
[task 2020-04-15T05:41:52.151Z] 05:41:52 INFO - GPU: UNKNOWN
[task 2020-04-15T05:41:52.151Z] 05:41:52 INFO - No crash
[task 2020-04-15T05:41:52.152Z] 05:41:52 INFO - Process uptime: 3600 seconds
[task 2020-04-15T05:41:52.152Z] 05:41:52 INFO - Thread 0
[task 2020-04-15T05:41:52.152Z] 05:41:52 INFO - 0 ntdll.dll!NtWaitForAlertByThreadId + 0x14
[task 2020-04-15T05:41:52.153Z] 05:41:52 INFO - rax = 0x00000000000001c2 rdx = 0x0000000000000000
[task 2020-04-15T05:41:52.153Z] 05:41:52 INFO - rcx = 0x0000025d83154220 rbx = 0x0000000000000000
[task 2020-04-15T05:41:52.153Z] 05:41:52 INFO - rsi = 0x0000025d83154220 rdi = 0x0000000000000000
[task 2020-04-15T05:41:52.153Z] 05:41:52 INFO - rbp = 0x000000ce77bfddb0 rsp = 0x000000ce77bfdd58
[task 2020-04-15T05:41:52.154Z] 05:41:52 INFO - r8 = 0x0000000000001240 r9 = 0x0000000000000000
[task 2020-04-15T05:41:52.154Z] 05:41:52 INFO - r10 = 0x00000000ffffffff r11 = 0x0000025d9ff01310
[task 2020-04-15T05:41:52.154Z] 05:41:52 INFO - r12 = 0x0000000000000000 r13 = 0x0000000000000000
[task 2020-04-15T05:41:52.154Z] 05:41:52 INFO - r14 = 0x0000000000000000 r15 = 0x0000025d83154258
[task 2020-04-15T05:41:52.154Z] 05:41:52 INFO - rip = 0x00007ff905338b64
[task 2020-04-15T05:41:52.155Z] 05:41:52 INFO - Found by: given as instruction pointer in context
[task 2020-04-15T05:41:52.155Z] 05:41:52 INFO - 1 ntdll.dll!RtlSleepConditionVariableSRW + 0xf9
[task 2020-04-15T05:41:52.155Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.155Z] 05:41:52 INFO - rsp = 0x000000ce77bfdd60 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.156Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.156Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff905301169
[task 2020-04-15T05:41:52.156Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.156Z] 05:41:52 INFO - 2 KERNELBASE.dll!Normalization__NormalizeCharacter + 0x44
[task 2020-04-15T05:41:52.157Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.157Z] 05:41:52 INFO - rsp = 0x000000ce77bfdde0 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.157Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.157Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff901e9b3e8
[task 2020-04-15T05:41:52.158Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.158Z] 05:41:52 INFO - 3 mozglue.dll!mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) [ConditionVariable_windows.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 58 + 0x11]
[task 2020-04-15T05:41:52.158Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.158Z] 05:41:52 INFO - rsp = 0x000000ce77bfde20 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.158Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.159Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8f76c18c4
[task 2020-04-15T05:41:52.159Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.159Z] 05:41:52 INFO - 4 xul.dll!mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) [WindowsMessageLoop.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 948 + 0x2a]
[task 2020-04-15T05:41:52.159Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.160Z] 05:41:52 INFO - rsp = 0x000000ce77bfde50 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.160Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.160Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce8d949c
[task 2020-04-15T05:41:52.160Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.161Z] 05:41:52 INFO - 5 xul.dll!mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) [MessageChannel.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 1550 + 0xd]
[task 2020-04-15T05:41:52.161Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.161Z] 05:41:52 INFO - rsp = 0x000000ce77bfe040 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.161Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.161Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce8d4da6
[task 2020-04-15T05:41:52.162Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.162Z] 05:41:52 INFO - 6 xul.dll!mozilla::layers::PCompositorBridgeChild::SendFlushRendering() [PCompositorBridgeChild.cpp: : 1235 + 0x10]
[task 2020-04-15T05:41:52.162Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.163Z] 05:41:52 INFO - rsp = 0x000000ce77bfe1e0 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.163Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.163Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8cea8d275
[task 2020-04-15T05:41:52.164Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.164Z] 05:41:52 INFO - 7 xul.dll!nsViewManager::Refresh(nsView*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) [nsViewManager.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 335 + 0xc]
[task 2020-04-15T05:41:52.164Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.164Z] 05:41:52 INFO - rsp = 0x000000ce77bfe2c0 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.165Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.165Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8cea8cc8d
[task 2020-04-15T05:41:52.165Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.165Z] 05:41:52 INFO - 8 xul.dll!nsViewManager::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) [nsViewManager.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 691 + 0xe]
[task 2020-04-15T05:41:52.165Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.166Z] 05:41:52 INFO - rsp = 0x000000ce77bfe330 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.166Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.166Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8cea8cb57
[task 2020-04-15T05:41:52.166Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.166Z] 05:41:52 INFO - 9 xul.dll!nsView::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>) [nsView.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 1055 + 0xb]
[task 2020-04-15T05:41:52.167Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.167Z] 05:41:52 INFO - rsp = 0x000000ce77bfe370 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.167Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.167Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8cea8c92b
[task 2020-04-15T05:41:52.167Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.167Z] 05:41:52 INFO - 10 xul.dll!nsWindow::OnPaint(HDC__, unsigned int) [nsWindowGfx.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 387 + 0x12]
[task 2020-04-15T05:41:52.168Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.168Z] 05:41:52 INFO - rsp = 0x000000ce77bfe3d0 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.168Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.168Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce9da0da
[task 2020-04-15T05:41:52.168Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.168Z] 05:41:52 INFO - 11 xul.dll!nsWindow::ProcessMessage(unsigned int, unsigned long long&, long long&, long long
) [nsWindow.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 5323 + 0xd]
[task 2020-04-15T05:41:52.169Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.169Z] 05:41:52 INFO - rsp = 0x000000ce77bfe680 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.169Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.169Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce78cd7e
[task 2020-04-15T05:41:52.169Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.169Z] 05:41:52 INFO - 12 xul.dll!static nsWindow::WindowProcInternal(HWND__, unsigned int, unsigned long long, long long) [nsWindow.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 4809 + 0x1e]
[task 2020-04-15T05:41:52.170Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.170Z] 05:41:52 INFO - rsp = 0x000000ce77bfe8a0 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.170Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.170Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce78ba2b
[task 2020-04-15T05:41:52.170Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - 13 xul.dll!CallWindowProcCrashProtected(long long (
)(HWND__, unsigned int, unsigned long long, long long), HWND__, unsigned int, unsigned long long, long long) [nsCrashOnException.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 27 + 0xb]
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - rsp = 0x000000ce77bfe940 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce78b90f
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.171Z] 05:41:52 INFO - 14 xul.dll!static nsWindow::WindowProc(HWND__*, unsigned int, unsigned long long, long long) [nsWindow.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 4762 + 0x1a]
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - rsp = 0x000000ce77bfe970 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff8ce78b8aa
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - 15 user32.dll!RtlpWaitOnCriticalSection + 0x14
[task 2020-04-15T05:41:52.172Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.173Z] 05:41:52 INFO - rsp = 0x000000ce77bfe9d0 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.173Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.173Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0x00007ff9028abc50
[task 2020-04-15T05:41:52.173Z] 05:41:52 INFO - Found by: call frame info
[task 2020-04-15T05:41:52.173Z] 05:41:52 INFO - 16 0xfffffda28b9ded8f
[task 2020-04-15T05:41:52.173Z] 05:41:52 INFO - rbx = 0x0000000000000000 rbp = 0x000000ce77bfddb0
[task 2020-04-15T05:41:52.174Z] 05:41:52 INFO - rsp = 0x000000ce77bfea80 r12 = 0x0000000000000000
[task 2020-04-15T05:41:52.174Z] 05:41:52 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2020-04-15T05:41:52.174Z] 05:41:52 INFO - r15 = 0x0000025d83154258 rip = 0xfffffda28b9ded8f
[task 2020-04-15T05:41:52.174Z] 05:41:52 INFO - Found by: call frame info

The first real helpful stack frame seems to be:

[task 2020-04-15T05:41:52.164Z] 05:41:52 INFO - 7 xul.dll!nsViewManager::Refresh(nsView*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) [nsViewManager.cpp:6b68d5d74c5a0fc879e48faec9b33d2c617a9568 : 335 + 0xc]

So related to web painting?

Component: Talos → Web Painting
Product: Testing → Core
Version: Version 3 → unspecified
Summary: Intermittent PROCESS-CRASH | damp | application crashed [unknown top frame] → Intermittent PROCESS-CRASH | <talos> | application crashed [unknown top frame]
See Also: → 1638851

None of these seem to be real crashes but Firefox got killed because of hangs, right? How should Talos or mozcrash itself actually handle such a case?

Flags: needinfo?(gsvelto)
Component: Web Painting → General

Yeah, it looks like the parent process was sitting idle and not responding and thus was killed. See these lines.

Flags: needinfo?(gsvelto)

I assume all the recent entries here are related to bug 1638851. Gerald, what do you say?

Flags: needinfo?(gsquelart)

It looks similar, and started almost at the same time. It may well be the same issue, but I'm not certain. I'll work on both...

Assignee: nobody → gsquelart
Flags: needinfo?(gsquelart)

In the last 7 days this bug failed 21 times, on: macosx1014-64-shippable opt and windows10-64-shippable opt.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=306964161&repo=autoland&lineNumber=3484

Gerald: do you have any updates on this bug?

Flags: needinfo?(gsquelart)
Whiteboard: [stockwell needswork:owner]

No updates yet. I'll work on it this coming week..

Severity: normal → S3
Flags: needinfo?(gsquelart)
Priority: P5 → P1

Update:
I haven't found the root cause of these issues yet.
But I can confirm that preventing the Base Profiler from running makes them disappear (in my Try runs at least).
When Base Profiler is running, it seems to make from sub-processes unresponsive, so when the parent process tries to gather profiles from sub-processes, they don't all respond, so the profile is never generated, and the test times out.
I've found a potential bottleneck when copying the Base Profiler data to the Gecko Profiler (after xpcom starts); but after optimizing it, the issue was still present. (This optimization should be done anyway, I'll work on a separate bug for that...)

So with the upcoming new release, I will create a quick-fix patch here, to prevent Base Profiler from running in Talos tests.

I've already opened follow-up bugs:

  • Bug 1648324 will add a timeout when gathering sub-process profiles; this should allow the Base Profiler again. And it will help whenever a process gets stuck for any reason.
  • Bug 1648325 will investigate the root cause, to hopefully find why the Base Profiler seems to block processes.

The Base Profiler is still recent and barely used, so it may contain some bugs.
With bug 1586939, the Base Profiler is now used more often because it is controlled the same way as the Gecko Profiler.
This has surfaced some intermittent issues, which pollute existing tests.

Until the root cause is found (see bug 1648325), setting MOZ_PROFILER_STARTUP_NO_BASE=1 prevents the Base Profiler from running. This may be used where problems are visible, to diagnostic them and/or reduce them where needed.

Talos tests suffer the most from intermittents that seem due to the Base Profiler.
So until symptoms are reduced (bug 1648324) or the root cause is fixed (bug 1648325), Talos tests will run without the Base Profiler.

Depends on D81018

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/67d4045e7c5d MOZ_PROFILER_STARTUP_NO_BASE=1 prevents Base Profiler from running when MOZ_PROFILER_STARTUP is set - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/efd7b443dea5 Set MOZ_PROFILER_STARTUP_NO_BASE=1 in Talos tests - r=canaltinova
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Alexandru, can you please file a new issue? The underlying problem here is different. Thanks.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Flags: needinfo?(malexandru)
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Done, thank you too!

Flags: needinfo?(malexandru)
See Also: → 1649139
See Also: → 1674906
See Also: → 1690511
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: