Crash in [@ mozilla::ipc::MessagePumpForNonMainThreads::Run]
Categories
(Core :: IPC, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox81 | --- | unaffected |
firefox82 | + | wontfix |
firefox83 | --- | wontfix |
firefox84 | --- | wontfix |
People
(Reporter: aryx, Unassigned)
Details
(Keywords: crash, regression, sec-moderate)
Crash Data
There is a frequency increase on Windows 7 x86 for this signature starting with the 82 betas. Now each crashing installation reports ~5-6 crashes while for 81, there were ~1-2. 82.0b2 also had 12 installations reporting crashes, more than any of the betas.
Crash report: https://crash-stats.mozilla.org/report/index/e394de81-a364-4a13-9aba-88b3b0200925
Top 10 frames of crashing thread:
0 @0xcedd74
1 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
2 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:302
3 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327
4 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
5 xul.dll static nsThread::ThreadFunc xpcom/threads/nsThread.cpp:442
6 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:399
7 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:139
8 ucrtbase.dll thread_start<unsigned int >
9 kernel32.dll BaseThreadInitThunk
Comment 1•4 years ago
|
||
These crashes are all happening while spinning the event loop on the socket thread, which I think indicates a networking issue.
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Note that this is a startup crash, and it looks like it is happening quite early. Some of the crash reports were happening when we were setting up the JS context. Maybe there's some race with setting up the socket thread?
Updated•4 years ago
|
Updated•4 years ago
|
Comment 3•4 years ago
|
||
Aggregating for crash reasons, we have several:
1 EXCEPTION_ACCESS_VIOLATION_EXEC 168 65.62 %
2 EXCEPTION_ACCESS_VIOLATION_READ 56 21.88 %
3 EXCEPTION_ACCESS_VIOLATION_WRITE 16 6.25 %
4 EXCEPTION_GUARD_PAGE 5 1.95 %
5 EXCEPTION_ILLEGAL_INSTRUCTION 3 1.17 %
6 EXCEPTION_BREAKPOINT 2 0.78 %
7 EXCEPTION_STACK_BUFFER_OVERRUN 2 0.78 %
8 SIGSEGV /SEGV_MAPERR 2 0.78 %
9 EXCEPTION_PRIV_INSTRUCTION 1 0.39 %
10 SIGSEGV /0x00000000 1 0.39 %
EXCEPTION_ACCESS_VIOLATION_EXEC
EXCEPTION_ACCESS_VIOLATION_READ
EXCEPTION_ACCESS_VIOLATION_WRITE
FWIW, on older versions than 82.0b4 I see the same pattern also on different threads, not only the socket thread, with the main thread being in various states, but AFAICS always during initialization. Nevertheless, on 82.0b4 it is happening on the socket thread only, it seems. This might be just a case, though.
Comment 5•4 years ago
|
||
Just another detail: With 82.0b4 this seems to happen only under Windows 7 in 32 Bit mode - with older builds there are also some occurrences on more modern OS.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 6•4 years ago
•
|
||
I had now the possibility to look at a minidump, reporting the stack here:
0c44dd74() Unknown
[Die unten aufgeführten Frames sind möglicherweise nicht korrekt und/oder fehlen.] Unbekannt
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Zeile 1239 C++
[Inlineframe] xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Zeile 513 C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Zeile 302 C++
[Inlineframe] xul.dll!MessageLoop::RunInternal() Zeile 334 C++
xul.dll!MessageLoop::RunHandler() Zeile 328 C++
xul.dll!MessageLoop::Run() Zeile 310 C++
xul.dll!nsThread::ThreadFunc(void * aArg) Zeile 444 C++
nss3.dll!_PR_NativeRunThread(void * arg) Zeile 399 C
nss3.dll!pr_root(void * arg) Zeile 139 C
ucrtbase.dll!thread_start<unsigned int (__stdcall*)(void *)>() Unbekannt
kernel32.dll!@BaseThreadInitThunk@12() Unbekannt
ntdll.dll!___RtlUserThreadStart@8() Unbekannt
ntdll.dll!__RtlUserThreadStart@8() Unbekannt
Comment 7•4 years ago
|
||
Since at least the top of the stack is corrupted, we don't know for sure, but let's assume that the bottom of the stack is valid.
Then we are somewhere in event->Run()
(called at https://searchfox.org/mozilla-central/rev/9c72508fcf2bba709a5b5b9eae9da35e0c707baa/xpcom/threads/nsThread.cpp#1197), and what went actually wrong depends on the type of the nsIRunnable
, which we don't know.
Is it possible to (temporarily?) add a crash annotation that is filled before calling event->Run()
with the type of the event (having no RTTI, maybe the address of the vtable?), so that if there is some crash within that, we can tell at least which type of event it was to somehow narrow this down?
Not sure who is knowledgable here... Gabriele have you got an idea or could redirect this?
Updated•4 years ago
|
Comment 8•4 years ago
|
||
Nika introduced something similar for the main thread in bug 1608158. It gathers the current executable name (if set) and adds it to the crash report. She also added a RAII class to make adding this kind of annotations a simpler task.
Comment 9•4 years ago
|
||
One thing to keep in mind is that retrieving a runnable's name is a rather expensive operation which is why it's enabled only on nightly and only on the main thread.
Comment 10•4 years ago
|
||
I'll crack open a minidump for you in VS to see if I can find something interesting. One thing to note is that all the crashes aren't on the socket thread, but most of them are so we might be dealing with an actual stability issue in the networking code and some noise from unrelated crash reports.
Comment 11•4 years ago
|
||
The beta crash spike seems to be reverting back to more normal levels, but we have had a longstanding crash in this signature.
Comment 12•4 years ago
|
||
FYI I opened up a minidump with VS but couldn't find anything interesting in there.
Updated•4 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 13•3 years ago
|
||
EKR told me he's hit this twice this week using Nightly on an Intel Macbook Pro. Not reproducible, unfortunately.
Comment 14•3 years ago
|
||
FWIW, I took this as a reminder to look at another crash dump. This one shows that we trigger a MOZ_RELEASE_ASSERT(CorePS::Exists())
during AutoProfilerLabel::Push(...)
, which most likely means that we try to dispatch a Runnable before we ever created or after we already destroyed CorePS
.
Comment 15•2 years ago
|
||
I have not seen anything related to networking in ~10 crashes I looked at. They all crash on 3 different threads.
Comment 16•8 months ago
|
||
There are a bunch of different crashes with this signature that appear to have different causes, running on different dedicated threads, crashing at different places in the named function. Not sure it's useful to lump these all together. It's rarely a "startup" crash any more. and no sign of the socket thread.
One cluster worth looking into is a bunch of linux crashes in 116/117 on the Compositor thread that crash on our UAF poison value +0x18, like bp-7d5f7415-0f63-4a7a-b0b2-372140230710
Description
•