Closed Bug 897870 Opened 6 years ago Closed 3 months ago

crash in mozilla::Logger::~Logger with abort message: "file e:/builds/.../ipc/chromium/src/base/pickle.cc, line 136" (Pickle::Pickle)

Categories

(Core :: IPC, defect, P3, critical)

25 Branch
All
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox24 --- unaffected
firefox25 --- wontfix
firefox26 --- wontfix
firefox27 + wontfix
firefox28 --- wontfix
firefox29 --- affected

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [leave open])

Crash Data

Attachments

(4 files, 1 obsolete file)

It started spiking in 25.0a1/20130724. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5ceea82a79c7&tochange=2983ca6d4d1a
It might be a regression from bug 884061.

Signature 	mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::Logger::~Logger() More Reports Search
UUID 	8720cf38-027e-4bad-b5b2-cf0c32130724
Date Processed	2013-07-24 16:12:27.638528
Uptime	82
Last Crash	1103052 seconds before submission
Install Age 	82 since version was first installed.
Install Time 	2013-07-24 16:09:53
Product 	Firefox
Version 	25.0a1
Build ID 	20130724030204
Release Channel 	nightly
OS 	Windows NT
OS Version 	6.1.7601 Service Pack 1
Build Architecture 	x86
Build Architecture Info 	AuthenticAMD family 16 model 4 stepping 2 | 4
Crash Reason 	EXCEPTION_BREAKPOINT
Crash Address 	0x72f511cd
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x11c6, AdapterSubsysID: 35541458, AdapterDriverVersion: 9.18.13.1422
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ xpcom_runtime_abort([Parent 2444] ###!!! ABORT: file e:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/ipc/chromium/src/base/pickle.cc, line 136)
Adapter Vendor ID 	0x10de
Adapter Device ID 	0x11c6
Total Virtual Memory 	2726166528
Available Virtual Memory 	331247616
System Memory Use Percentage 	94
Available Page File 	13847392256
Available Physical Memory 	169476096

Frame 	Module 	Signature 	Source
0 	mozalloc.dll 	mozalloc_abort(char const * const) 	memory/mozalloc/mozalloc_abort.cpp
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp
2 	xul.dll 	mozilla::Logger::~Logger() 	ipc/chromium/src/base/logging.cc
3 	xul.dll 	Pickle::Pickle(Pickle const &) 	ipc/chromium/src/base/pickle.cc
4 	xul.dll 	IPC::Message::Message(IPC::Message const &) 	ipc/chromium/src/chrome/common/ipc_message.cc
5 	xul.dll 	std::deque<IPC::Message,std::allocator<IPC::Message> >::push_back(IPC::Message const &) 	c:/tools/msvs10/vc/include/deque
6 	xul.dll 	mozilla::ipc::RPCChannel::OnMessageReceivedFromLink(IPC::Message const &) 	ipc/glue/RPCChannel.cpp
7 	xul.dll 	mozilla::ipc::AsyncChannel::ProcessLink::OnMessageReceived(IPC::Message const &) 	ipc/glue/AsyncChannel.cpp
8 	xul.dll 	IPC::Channel::ChannelImpl::ProcessIncomingMessages(base::MessagePumpForIO::IOContext *,unsigned long) 	ipc/chromium/src/chrome/common/ipc_channel_win.cc
9 	xul.dll 	IPC::Channel::ChannelImpl::OnIOCompleted(base::MessagePumpForIO::IOContext *,unsigned long,unsigned long) 	ipc/chromium/src/chrome/common/ipc_channel_win.cc
10 	xul.dll 	base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) 	ipc/chromium/src/base/message_pump_win.cc
11 	xul.dll 	base::MessagePumpForIO::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc
12 	xul.dll 	base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate *,base::MessagePumpWin::Dispatcher *) 	ipc/chromium/src/base/message_pump_win.cc
13 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate *) 	ipc/chromium/src/base/message_pump_win.h
14 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
15 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
16 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc
17 	xul.dll 	`anonymous namespace'::ThreadFunc(void *) 	tools/profiler/platform-win32.cc
18 	kernel32.dll 	BaseThreadInitThunk 	
19 	ntdll.dll 	__RtlUserThreadStart 	
20 	ntdll.dll 	_RtlUserThreadStart

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char+const*+const%29+|+NS_DebugBreak+|+mozilla%3A%3ALogger%3A%3A~Logger%28%29
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozalloc_abort%28char+const*+const%29+|+NS_DebugBreak
It's #12 browser crasher in 25.0a2 and #60 in 26.0a1.
This is currently #8 on 27 betas and has increased in volume by a factor of 3-4 between 27beta2 and 27beta4/5.

Nothing in correlations.
This is an allocation failure. It looks like real OOM, not heap corruption. I do wonder whether we're pickling small things or whether we're sending potentially large messages across. But probably there's not much to do about this except figure out memory issues.
This spiked significantly on the beta channel when 27 beta 1 was pushed out. It went from ~3 crashes per million ADI in Firefox 26 beta to 100-120 crashes per ADI in Firefox 27 beta.

Benjamin, given your comment #3, is this something we need to have someone look at?
Flags: needinfo?(benjamin)
I can't tell how big these allocations are. From code inspection, the allocation size ought to be other.header_size_ + other.header_->payload_size. 

other.header_size_ is always 0x1c, but we don't have the memory for payload_size (and it doesn't appear to have any remnants in stack or registers either).
https://crash-analysis.mozilla.com/bsmedberg/LoggerAbort-nightly.svg

This spiked on nightly when we fixed the breakpad reservation (bug 943051) and I suspect that it spiked on beta at the same time: remember that we landed the memory reservation into the last beta for FF26. So I don't think this is something to worry about in FF27 (just moving from empty-dump).

We should however use NS_ABORT_OOM to annotate how big the buffer actually is, to determine whether there are very large IPC buffers that there shouldn't be.
Flags: needinfo?(benjamin)
> mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::Logger::~Logger()

This signature has risen to the #9 topcrash on Aurora 28 and now accounts for 1.12% of crashes in Firefox 28. The following is the current 7-day ranks.

Firefox 26: n/a
Firefox 27:  #7 @ 0.90%
Firefox 28:  #9 @ 1.12% 
Firefox 29  #47 @ 0.30%

I'm not sure what has caused this to rise but I'm nominating for tracking.
Comment on attachment 8368055 [details] [diff] [review]
Log allocation size on failure in Pickle::Pickle, rev. 1

I'm tempted to suggest doing the check in Resize -- it would help avoid the copy/paste of the size parameter, and catch a couple other constructors that would null-crash today.

But that would change the fallibility of the call from BeginWrite [1] and I don't know if that's desirable. What do you think?

[1] http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/pickle.cc#486
If all of the callers treat it as infallible, that doesn't sound like a problem.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #10)
> If all of the callers treat it as infallible, that doesn't sound like a
> problem.

BeginWrite (and its callers, and so on) appear to be properly plumbed for fallible allocations.
Tracking this based on Comment 8
Comment on attachment 8368055 [details] [diff] [review]
Log allocation size on failure in Pickle::Pickle, rev. 1

Review of attachment 8368055 [details] [diff] [review]:
-----------------------------------------------------------------

Ok let's not worry about BeginWrite for now.

I do see some crash reports in Pickle::Pickle(int) so that might be worth adding too.
Attachment #8368055 - Flags: review?(dmajor) → review+
Mozilla/5.0 (Windows NT 6.3; rv:29.0) Gecko/20100101 Firefox/29.0

I've crashed Firefox Aurora (build ID: 20140206004003) several times on Windows 8.1 32bit.

STR:
1. Open Firefox and go to Options -> Tabs and un-check "Don't load tabs until selected"
2. Open 3 tabs and close Firefox.
3. Start Firefox and restore previous session.
4. Open a new tab.
5. Right click on the new tab and select close tab.
6. Right click on one of the tabs.
7. (optional - sometimes it crashes after previous step) Create a new tab and right click on it -> Close tab.

Results: Firefox crashes.

This is intermittent, I was able to reproduce it only 5 times out of 20+ attempts.
I could not reproduce this on Windows 8.1 64bit.

Reports: 
bp-6edeeb1a-398a-468e-b4a0-1057b2140207
bp-5d8c83ba-758c-4516-9d5f-8cbff2140207
bp-890d48e8-cb0a-4494-8e9c-442e72140207
bp-a6ed8467-e192-4b9a-ac33-bdf832140207
bp-cc4d212d-6e66-4c0e-aa8b-5e2f22140207
Keywords: checkin-needed
Whiteboard: [leave open]
Comment on attachment 8374177 [details] [diff] [review]
Log allocation size on failure in Pickle::Pickle, rev. 1.1

https://hg.mozilla.org/integration/mozilla-inbound/rev/f8f2fba778f3
Attachment #8374177 - Flags: checkin+
Is this log allocation supposed to be the fix or is this something for your investigation here, David?  This bug is tracked on FF28 and can't be assigned to "nobody" so if you're not the person working on it, reassign to another human, but otherwise we'll be looking for your feedback with this patch just landed to trunk.
Assignee: nobody → dmajor
Assignee: dmajor → benjamin
After the change in comment 17, these now show up under the OOM signature: NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)

Looks like they are generally in the hundreds-of-KB to couple-MB range.
Crash Signature: xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] → xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] [@ NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)]
Given that data, I don't think there's anything to do here except work on memory consumption.
Assignee: benjamin → nobody
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> Given that data, I don't think there's anything to do here except work on
> memory consumption.

Benjamin, do you think there's anything more QA can do to assist here or can we untrack this?
FWIW a redditor is reporting frequent, consistent crashes with this or a similar signature over at http://www.reddit.com/r/firefox/comments/1zd26f/anyone_else_having_a_problem_with_firefox_today/

Some of the crashes:

b7c7a359-5ef7-427a-9eb4-9a8072140302
528b2a1d-7a14-45c9-af47-da5152140303

Redditor has reportedly defected to Chrome because of these.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #22)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #21)
> > Given that data, I don't think there's anything to do here except work on
> > memory consumption.
> 
> Benjamin, do you think there's anything more QA can do to assist here or can
> we untrack this?

Given Benjamin's comments, I feel safe to have this untracked for all upcoming releases.

That of course doesn't mean we shouldn't work on concrete cases we can figure out, for example if there's reproducible steps for trigger it.


(In reply to alex_mayorga from comment #24)
> Direct links as bugzilla didn't "linkify" the IDs:

You need to prefix them with bp- (for breakpad, the name of our crash reporter) so for example bp-b7c7a359-5ef7-427a-9eb4-9a8072140302 works to be linked.
We have a user who is reporting this crash on SUMO (https://support.mozilla.org/en-US/questions/990515) and has provided us memory dumps.
Attached file Memory Log #2
Attachment #8394302 - Attachment description: Memory Love #2 → Memory Log #2
Crash Signature: xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] [@ NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)] → xul.dll@0x340b91 | xul.dll@0x33d43a | xul.dll@0x3dbadb | xul.dll@0x3dbd13 | xul.dll@0x3d9ce0 | xul.dll@0x3d9ede | xul.dll@0x3d9573 | xul.dll... ] [@ NS_ABORT_OOM(unsigned int) | Pickle::Pickle(Pickle const&)] [@ mozalloc_abort | NS_DebugBreak | mozilla…
Priority: -- → P2
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.