Closed Bug 1528612 Opened 5 years ago Closed 2 years ago

Crash in [@ profiler_get_backtrace]

Categories

(Core :: Gecko Profiler, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- fix-optional
firefox68 --- fix-optional

People

(Reporter: calixte, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

This bug is for crash report bp-9f5bb9f1-fc5c-4604-823d-68a490190216.

Top 10 frames of crashing thread:

0 xul.dll profiler_get_backtrace tools/profiler/core/platform.cpp:3682
1 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:281
2 xul.dll mozilla::net::nsHttpHandler::NotifyObservers netwerk/protocol/http/nsHttpHandler.cpp:823
3 xul.dll mozilla::net::nsHttpChannel::AsyncOpen netwerk/protocol/http/nsHttpChannel.cpp:6289
4 xul.dll void mozilla::net::HttpChannelParent::InvokeAsyncOpen netwerk/protocol/http/HttpChannelParent.cpp:378
5 xul.dll void mozilla::net::HttpChannelParent::TryInvokeAsyncOpen netwerk/protocol/http/HttpChannelParent.cpp:185
6 xul.dll void mozilla::MozPromise<bool, nsresult, 0>::ThenValue<`lambda at z:/build/build/src/netwerk/protocol/http/HttpChannelParent.cpp:668:14', `lambda at z:/build/build/src/netwerk/protocol/http/HttpChannelParent.cpp:672:14'>::DoResolveOrRejectInternal xpcom/threads/MozPromise.h:716
7 xul.dll nsresult mozilla::MozPromise<bool, nsresult, 0>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:392
8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1162
9 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:474

There is 1 crash in nightly 67 with buildid 20190216093716. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1526998.

[1] https://hg.mozilla.org/mozilla-central/rev?node=4394e1f91db8

Flags: needinfo?(florian)

I was wondering if it was possible that we were doing this off main thread on something that wasn't thread safe, but it looks like the observer service always runs on the main thread... so I have no idea of how bug 1526998 could have caused this.

It looks like we already had similar crashes in 64 release when collecting the stack of what's causing an invalidation. Markus, any idea?

Flags: needinfo?(florian) → needinfo?(mstange)

(In reply to Florian Quèze [:florian] from comment #1)

so I have no idea of how bug 1526998 could have caused this.

Yeah there's no way it would have caused it. But it might well have increased the frequency of the crash: We dispatch a lot more observer notifications than we trigger restyles / reflows.

Flags: needinfo?(mstange)

David, jrmuizel and I took a look at this crash and it doesn't make much sense to us. The crash address is 0x541, in an instruction that reads something from the stack at offset 0x540 from RSP. (The thing it's reading is the stored-on-stack value for the __security_check_cookie check in the function epilogue.)
However, the crash report has the following value for the RSP register: 0x0000009dd7ffe5b0
So it seems like one place thinks RSP is 0x1 and another place thinks it's something else. Have you seen something like this before? Could this odd state be triggered by the synchronous register dumping + stackwalking earlier in the function?

Flags: needinfo?(dmajor)

Can you send me the minidump?

Flags: needinfo?(mstange)

Sent.

Flags: needinfo?(mstange)

I notice that rax==1. What if a single-bit error turned this...

488b8c2440050000 mov rcx,qword ptr [rsp+540h]

into this...?

488b8c2040050000 mov rcx,qword ptr [rax+540h]

The instruction shows up correctly in the minidump's memory though, so maybe it was transient or happened within the fetch somewhere.

But I really don't like it when we point the finger at hardware so quickly, so I hope I'm wrong! Please be on the lookout for additional crash reports that would discredit this theory.

Flags: needinfo?(dmajor)

Yikes!

I will be on the lookout. As far as I can tell, this is first crash of this particular type that we've seen in this function, but maybe more will appear in the future. (The other crashes I can find are stack overflows or crash at other locations in this function, or with much higher crash address values.)

Priority: -- → P3

Bulk change for all regression bugs with status-firefox67 as 'fix-optional' to be marked 'affected' for status-firefox68.

QA Whiteboard: qa-not-actionable

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.