Closed Bug 897870 Opened 11 years ago Closed 5 years ago

crash in mozilla::Logger::~Logger with abort message: "file e:/builds/.../ipc/chromium/src/base/pickle.cc, line 136" (Pickle::Pickle)

Categories

(Core :: IPC, defect, P3)

25 Branch
All
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox24 --- unaffected
firefox25 --- wontfix
firefox26 --- wontfix
firefox27 + wontfix
firefox28 --- wontfix
firefox29 --- affected

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [leave open])

Crash Data

Attachments

(4 files, 1 obsolete file)

It started spiking in 25.0a1/20130724. The regression range for the spike is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5ceea82a79c7&tochange=2983ca6d4d1a It might be a regression from bug 884061. Signature mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::Logger::~Logger() More Reports Search UUID 8720cf38-027e-4bad-b5b2-cf0c32130724 Date Processed 2013-07-24 16:12:27.638528 Uptime 82 Last Crash 1103052 seconds before submission Install Age 82 since version was first installed. Install Time 2013-07-24 16:09:53 Product Firefox Version 25.0a1 Build ID 20130724030204 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info AuthenticAMD family 16 model 4 stepping 2 | 4 Crash Reason EXCEPTION_BREAKPOINT Crash Address 0x72f511cd App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x11c6, AdapterSubsysID: 35541458, AdapterDriverVersion: 9.18.13.1422 D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ xpcom_runtime_abort([Parent 2444] ###!!! ABORT: file e:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/ipc/chromium/src/base/pickle.cc, line 136) Adapter Vendor ID 0x10de Adapter Device ID 0x11c6 Total Virtual Memory 2726166528 Available Virtual Memory 331247616 System Memory Use Percentage 94 Available Page File 13847392256 Available Physical Memory 169476096 Frame Module Signature Source 0 mozalloc.dll mozalloc_abort(char const * const) memory/mozalloc/mozalloc_abort.cpp 1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp 2 xul.dll mozilla::Logger::~Logger() ipc/chromium/src/base/logging.cc 3 xul.dll Pickle::Pickle(Pickle const &) ipc/chromium/src/base/pickle.cc 4 xul.dll IPC::Message::Message(IPC::Message const &) ipc/chromium/src/chrome/common/ipc_message.cc 5 xul.dll std::deque<IPC::Message,std::allocator<IPC::Message> >::push_back(IPC::Message const &) c:/tools/msvs10/vc/include/deque 6 xul.dll mozilla::ipc::RPCChannel::OnMessageReceivedFromLink(IPC::Message const &) ipc/glue/RPCChannel.cpp 7 xul.dll mozilla::ipc::AsyncChannel::ProcessLink::OnMessageReceived(IPC::Message const &) ipc/glue/AsyncChannel.cpp 8 xul.dll IPC::Channel::ChannelImpl::ProcessIncomingMessages(base::MessagePumpForIO::IOContext *,unsigned long) ipc/chromium/src/chrome/common/ipc_channel_win.cc 9 xul.dll IPC::Channel::ChannelImpl::OnIOCompleted(base::MessagePumpForIO::IOContext *,unsigned long,unsigned long) ipc/chromium/src/chrome/common/ipc_channel_win.cc 10 xul.dll base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) ipc/chromium/src/base/message_pump_win.cc 11 xul.dll base::MessagePumpForIO::DoRunLoop() ipc/chromium/src/base/message_pump_win.cc 12 xul.dll base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate *,base::MessagePumpWin::Dispatcher *) ipc/chromium/src/base/message_pump_win.cc 13 xul.dll base::MessagePumpWin::Run(base::MessagePump::Delegate *) ipc/chromium/src/base/message_pump_win.h 14 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc 15 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 16 xul.dll base::Thread::ThreadMain() ipc/chromium/src/base/thread.cc 17 xul.dll `anonymous namespace'::ThreadFunc(void *) tools/profiler/platform-win32.cc 18 kernel32.dll BaseThreadInitThunk 19 ntdll.dll __RtlUserThreadStart 20 ntdll.dll _RtlUserThreadStart More reports at: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3ALogger%3A%3A~Logger%28%29 https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char+const*+const%29+|+NS_DebugBreak
It's #12 browser crasher in 25.0a2 and #60 in 26.0a1.
This is currently #8 on 27 betas and has increased in volume by a factor of 3-4 between 27beta2 and 27beta4/5. Nothing in correlations.
This is an allocation failure. It looks like real OOM, not heap corruption. I do wonder whether we're pickling small things or whether we're sending potentially large messages across. But probably there's not much to do about this except figure out memory issues.
This spiked significantly on the beta channel when 27 beta 1 was pushed out. It went from ~3 crashes per million ADI in Firefox 26 beta to 100-120 crashes per ADI in Firefox 27 beta. Benjamin, given your comment #3, is this something we need to have someone look at?
Flags: needinfo?(benjamin)
I can't tell how big these allocations are. From code inspection, the allocation size ought to be other.header_size_ + other.header_->payload_size. other.header_size_ is always 0x1c, but we don't have the memory for payload_size (and it doesn't appear to have any remnants in stack or registers either).
https://crash-analysis.mozilla.com/bsmedberg/LoggerAbort-nightly.svg This spiked on nightly when we fixed the breakpad reservation (bug 943051) and I suspect that it spiked on beta at the same time: remember that we landed the memory reservation into the last beta for FF26. So I don't think this is something to worry about in FF27 (just moving from empty-dump). We should however use NS_ABORT_OOM to annotate how big the buffer actually is, to determine whether there are very large IPC buffers that there shouldn't be.
Flags: needinfo?(benjamin)
> mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::Logger::~Logger() This signature has risen to the #9 topcrash on Aurora 28 and now accounts for 1.12% of crashes in Firefox 28. The following is the current 7-day ranks. Firefox 26: n/a Firefox 27: #7 @ 0.90% Firefox 28: #9 @ 1.12% Firefox 29 #47 @ 0.30% I'm not sure what has caused this to rise but I'm nominating for tracking.
Comment on attachment 8368055 [details] [diff] [review] Log allocation size on failure in Pickle::Pickle, rev. 1 I'm tempted to suggest doing the check in Resize -- it would help avoid the copy/paste of the size parameter, and catch a couple other constructors that would null-crash today. But that would change the fallibility of the call from BeginWrite [1] and I don't know if that's desirable. What do you think? [1] http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/pickle.cc#486
If all of the callers treat it as infallible, that doesn't sound like a problem.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #10) > If all of the callers treat it as infallible, that doesn't sound like a > problem. BeginWrite (and its callers, and so on) appear to be properly plumbed for fallible allocations.
Tracking this based on Comment 8
Comment on attachment 8368055 [details] [diff] [review] Log allocation size on failure in Pickle::Pickle, rev. 1 Review of attachment 8368055 [details] [diff] [review]: ----------------------------------------------------------------- Ok let's not worry about BeginWrite for now. I do see some crash reports in Pickle::Pickle(int) so that might be worth adding too.
Attachment #8368055 - Flags: review?(dmajor) → review+
Mozilla/5.0 (Windows NT 6.3; rv:29.0) Gecko/20100101 Firefox/29.0 I've crashed Firefox Aurora (build ID: 20140206004003) several times on Windows 8.1 32bit. STR: 1. Open Firefox and go to Options -> Tabs and un-check "Don't load tabs until selected" 2. Open 3 tabs and close Firefox. 3. Start Firefox and restore previous session. 4. Open a new tab. 5. Right click on the new tab and select close tab. 6. Right click on one of the tabs. 7. (optional - sometimes it crashes after previous step) Create a new tab and right click on it -> Close tab. Results: Firefox crashes. This is intermittent, I was able to reproduce it only 5 times out of 20+ attempts. I could not reproduce this on Windows 8.1 64bit. Reports: bp-6edeeb1a-398a-468e-b4a0-1057b2140207 bp-5d8c83ba-758c-4516-9d5f-8cbff2140207 bp-890d48e8-cb0a-4494-8e9c-442e72140207 bp-a6ed8467-e192-4b9a-ac33-bdf832140207 bp-cc4d212d-6e66-4c0e-aa8b-5e2f22140207
Keywords: checkin-needed
Whiteboard: [leave open]
Comment on attachment 8374177 [details] [diff] [review] Log allocation size on failure in Pickle::Pickle, rev. 1.1 https://hg.mozilla.org/integration/mozilla-inbound/rev/f8f2fba778f3
Attachment #8374177 - Flags: checkin+
Is this log allocation supposed to be the fix or is this something for your investigation here, David? This bug is tracked on FF28 and can't be assigned to "nobody" so if you're not the person working on it, reassign to another human, but otherwise we'll be looking for your feedback with this patch just landed to trunk.
Assignee: nobody → dmajor
Assignee: dmajor → benjamin
After the change in comment 17, these now show up under the OOM signature: NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&) Looks like they are generally in the hundreds-of-KB to couple-MB range.
Crash Signature: xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] → xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] [@ NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)]
Given that data, I don't think there's anything to do here except work on memory consumption.
Assignee: benjamin → nobody
(In reply to Benjamin Smedberg [:bsmedberg] from comment #21) > Given that data, I don't think there's anything to do here except work on > memory consumption. Benjamin, do you think there's anything more QA can do to assist here or can we untrack this?
FWIW a redditor is reporting frequent, consistent crashes with this or a similar signature over at http://www.reddit.com/r/firefox/comments/1zd26f/anyone_else_having_a_problem_with_firefox_today/ Some of the crashes: b7c7a359-5ef7-427a-9eb4-9a8072140302 528b2a1d-7a14-45c9-af47-da5152140303 Redditor has reportedly defected to Chrome because of these.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #22) > (In reply to Benjamin Smedberg [:bsmedberg] from comment #21) > > Given that data, I don't think there's anything to do here except work on > > memory consumption. > > Benjamin, do you think there's anything more QA can do to assist here or can > we untrack this? Given Benjamin's comments, I feel safe to have this untracked for all upcoming releases. That of course doesn't mean we shouldn't work on concrete cases we can figure out, for example if there's reproducible steps for trigger it. (In reply to alex_mayorga from comment #24) > Direct links as bugzilla didn't "linkify" the IDs: You need to prefix them with bp- (for breakpad, the name of our crash reporter) so for example bp-b7c7a359-5ef7-427a-9eb4-9a8072140302 works to be linked.
We have a user who is reporting this crash on SUMO (https://support.mozilla.org/en-US/questions/990515) and has provided us memory dumps.
Attached file Memory Log #2
Attachment #8394302 - Attachment description: Memory Love #2 → Memory Log #2
Crash Signature: xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] [@ NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)] → xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] [@ NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)] [@ mozalloc_abort | NS_DebugBreak | mozilla…
Priority: -- → P2
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: