Closed
Bug 1502424
Opened 6 years ago
Closed 6 years ago
Crash in std::vector<T>::_M_range_insert<T>
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
mozilla65
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox63 | --- | unaffected |
firefox64 | --- | wontfix |
firefox65 | --- | fixed |
People
(Reporter: marcia, Assigned: gsvelto)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
This bug was filed from the Socorro interface and is
report bp-863a06a3-8755-41a9-8e77-6c84d0181026.
=============================================================
Seen while reviewing nightly crash stats: https://bit.ly/2PotBF4. This linux crash dates back to when nightly was in 64. At the top of the stack is BHMgr Processor.
105 crashes, but only 12 installations. No comments.
A few tab related addons seem to be in the correlations:
* (100.0% in signature vs 01.01% overall) Addon "Undo Close Tab" = true
* (100.0% in signature vs 01.08% overall) Addon "Open Tabs Next to Current" = true
* (100.0% in signature vs 01.87% overall) Addon "Snooze Tabs" = true
Top 10 frames of crashing thread:
0 libxul.so void std::vector<unsigned char, google_breakpad::PageStdAllocator<unsigned char> >::_M_range_insert<unsigned char const*> clang/include/c++/4.9.4/ext/new_allocator.h:120
1 libxul.so google_breakpad::FileID::ElfFileIdentifierFromMappedFile clang/include/c++/4.9.4/bits/stl_vector.h:1377
2 libxul.so SharedLibraryAtPath toolkit/crashreporter/google-breakpad/src/common/linux/file_id.cc:162
3 libxul.so SharedLibraryInfo::GetInfoForSelf tools/profiler/core/shared-libraries-linux.cc:253
4 libxul.so mozilla::ProcessHangStackRunnable::Run toolkit/components/backgroundhangmonitor/HangDetails.cpp:334
5 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1245
6 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:530
7 libxul.so mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:364
8 libxul.so nsThread::ThreadFunc ipc/chromium/src/base/message_loop.cc:325
9 libnspr4.so _pt_root nsprpub/pr/src/pthreads/ptthread.c:201
=============================================================
Comment 1•6 years ago
|
||
This looks like a null deref crash inside breakpad. Ted, any ideas?
Component: General → Crash Reporting
Flags: needinfo?(ted)
Product: Core → Toolkit
Comment 2•6 years ago
|
||
So interestingly this is crashing in the background hang monitor, which calls into some Breakpad code just to get file identifiers for each loaded library. There are some inlined frames here, but it looks like `SharedLibraryAtPath` calls `getId` here:
https://dxr.mozilla.org/mozilla-central/rev/c291143e24019097d087f9307e59b49facaf90cb/tools/profiler/core/shared-libraries-linux.cc#127
and `getId` calls `ElfFileIdentifier` here:
https://dxr.mozilla.org/mozilla-central/rev/c291143e24019097d087f9307e59b49facaf90cb/tools/profiler/core/shared-libraries-linux.cc#105
The actual crash looks to be inside std::vector, it appears to be failing to allocate 14 bytes, I guess? The confounding factor is that this code works with `wasteful_vector`, which is a std::vector with a custom allocator that doesn't use malloc (so it's safe to use during a crash):
https://hg.mozilla.org/mozilla-central/file/3cc04ee79005058d817daf66da7963dfac3f0a3a/toolkit/crashreporter/google-breakpad/src/common/memory_allocator.h#l216
Specifically the caller uses an `auto_wasteful_vector` here, which is a subclass that has some stack space (like `nsAutoTArray`):
https://dxr.mozilla.org/mozilla-central/rev/c291143e24019097d087f9307e59b49facaf90cb/tools/profiler/core/shared-libraries-linux.cc#89
As it turns out `sizeof(MDGUID)` is not actually the best constant there (it actually wants `kDefaultBuildIdSize`, since it's almost certainly going to get an ELF Build ID and they're almost always a SHA-1 hash which is 20 bytes):
https://dxr.mozilla.org/mozilla-central/rev/c291143e24019097d087f9307e59b49facaf90cb/toolkit/crashreporter/breakpad-client/linux/minidump_writer/minidump_writer.cc#634
so it winds up allocating, which goes through this code:
https://hg.mozilla.org/mozilla-central/file/3cc04ee79005058d817daf66da7963dfac3f0a3a/toolkit/crashreporter/google-breakpad/src/common/memory_allocator.h#l74
but apparently that's returning NULL? I'm not sure what's happening there.
We should definitely fix that code in `getId` to use the right constant which ought to avoid heap allocations in 99% of cases.
Flags: needinfo?(ted)
Assignee | ||
Comment 3•6 years ago
|
||
This is really weird, there doesn't seem to be any reasons why that allocation would fail. Also this seems to be happening only within content processes. BTW the allocator is not thread-safe but it's being instanced on the calling stack so a race shouldn't be possible here, right?
Updated•6 years ago
|
Assignee | ||
Comment 4•6 years ago
|
||
We've discussed this a bit on IRC and while what we think is happening is possible it's also quite unlikely. That being said all the crashes but one come from a single machine that has nightly installed under /root (yeah, you read that right). There has been another crash though from a different machine with a more sensible install (under a user's home directory) so this might be a valid bug. If it's what we think it is (mmap() failing under weird conditions) we can mitigate it but a fix would be much more complex.
Assignee | ||
Comment 5•6 years ago
|
||
BTW while triaging I run into other "weird" crashes, this is one of them:
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Aipc%3A%3AIToplevelProtocol%3A%3AToplevelState%3A%3ACreateSharedMemory&date=%3E%3D2018-10-26T17%3A39%3A13.000Z&date=%3C2018-11-02T16%3A39%3A13.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=cpu_info&_sort=-date&page=1
The CPU info and Firefox installation path match the one here. This crash is dereferencing a null pointer that has been returned in the form of a shared mapped segment; said segment was allocated via mmap() which most likely failed and lead us to return a null pointer responsible for the crash here:
https://searchfox.org/mozilla-central/rev/50ba1dd30cf013bddce1ae756f1b3c95b26f0628/ipc/glue/Shmem.cpp#90
So it's pretty obvious to me that our friend here is running out of mmap()'able memory and we're dealing with this condition very poorly in various parts of our codebase. We should probably OOM-crash. So again, we can mitigate this bug (and we probably should) but the real fix would be to handle mmap() failures correctly. The problem is that would require an audit of almost the entire codebase since we're using it in a number of different places.
Assignee | ||
Updated•6 years ago
|
Crash Signature: [@ std::vector<T>::_M_range_insert<T>] → [@ std::vector<T>::_M_range_insert<T>]
[@ OOM | large | std::vector<T>::_M_range_insert<T>]
Assignee | ||
Comment 6•6 years ago
|
||
I've given this a little thought and I finally realized what could cause this issue: since this user is most likely running Firefox as root it's possible that he has a different ulimit configuration than a regular user. In fact this might be happening in a container or something like that and one of the limits that can be set is the amount of virtual memory one can allocate. If that's the case then mmap() failing is a natural consequence. Let's mitigate the issue here as Ted has described then I'll open a separate bug for auditing mmap() call sites.
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•6 years ago
|
||
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dc0584144f76
Accommodate for larger ELF identifiers to prevent dynamic allocations r=ted
Comment 9•6 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Updated•6 years ago
|
status-firefox-esr60:
--- → unaffected
You need to log in
before you can comment on or make changes to this bug.
Description
•