Closed Bug 866402 Opened 7 years ago Closed 7 years ago

crash in nsRefreshDriver::Tick

Categories

(Core :: Layout, defect, critical)

23 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 877097
mozilla23
Tracking Status
firefox22 --- unaffected
firefox23 --- disabled
firefox24 --- affected
firefox25 --- affected

People

(Reporter: scoobidiver, Assigned: spohl)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

It started spiking in 23.0a1/20130426 and is #3 crasher in today's build but with many duplicates and mostly at startup. The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=690b5e0f6562&tochange=a6104e0e5a2c
It's likely a regression from bug 753453.

Signature 	nsRefreshDriver::Tick(__int64, mozilla::TimeStamp) More Reports Search
UUID	00068b2b-6a42-4084-afc8-471812130427
Date Processed	2013-04-27 11:24:10
Uptime	13
Last Crash	33 seconds before submission
Install Age	17.6 hours since version was first installed.
Install Time	2013-04-26 17:46:11
Product	Firefox
Version	23.0a1
Build ID	20130426030834
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 16 model 4 stepping 3
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x14
User Comments	
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x1086, AdapterSubsysID: 120719da, AdapterDriverVersion: 9.18.13.1106
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
Processor Notes 	sp-processor01.phx1.mozilla.com_5380:2012
EMCheckCompatibility	True
Adapter Vendor ID	0x10de
Adapter Device ID	0x1086
Total Virtual Memory	4294836224
Available Virtual Memory	3701518336
System Memory Use Percentage	24
Available Page File	14891028480
Available Physical Memory	6486556672

Frame 	Module 	Signature 	Source
0 	xul.dll 	nsRefreshDriver::Tick 	layout/base/nsRefreshDriver.cpp:907
1 	xul.dll 	nsTArray_Impl<nsRefPtr<nsRefreshDriver>,nsTArrayInfallibleAllocator>::AppendElem 	obj-firefox/dist/include/nsTArray.h:1044
2 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:547
3 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:634
4 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:627
5 	xul.dll 	NS_ProcessNextEvent 	obj-firefox/xpcom/build/nsThreadUtils.cpp:238
6 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:82
7 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:212
8 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:186
9 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
10 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:113
11 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:289
12 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3878
13 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3945
14 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4157
15 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp:271
16 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:105
17 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
18 	kernel32.dll 	BaseThreadInitThunk 	
19 	ntdll.dll 	__RtlUserThreadStart 	
20 	ntdll.dll 	_RtlUserThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsRefreshDriver%3A%3ATick%28__int64%2C+mozilla%3A%3ATimeStamp%29
FYI:

nsRefreshDriver::Tick(long long, mozilla::TimeStamp) is affecting Linux.
Note long long vs __int64, otherwise the function signatures are the same.

https://crash-stats.mozilla.com/report/list?query_search=signature&range_value=4&range_unit=weeks&signature=nsRefreshDriver%3A%3ATick%28long%20long%2C%20mozilla%3A%3ATimeStamp%29

For the Linux crash, can reproduce reliably just by opening a few dozen tabs.
Once the critical limit is hit, firefox reliably crashes...
If you restore a session with more tabs than the critical limit, the next tab opened will crash.

Unknown if same root cause, but this severely hinders usability on Linux.
Rather, it seems that restore of a session beyond the critical limit will also trigger this.  As such, you can get into a viscous cycle of:

Crash
Re-open
Restore
Crash
Crash Signature: [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] → [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp)]
OS: Windows 7 → All
Whiteboard: [startupcrash]
Crash Signature: [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp)] → [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ]
Crash Signature: [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] → [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] [@ nsRefreshDriver::Tick(long, mozilla::TimeStamp) ]
Depends on: 866545
Scoobidiver, is there any way to tell whether the users seeing this have the "dom.enable_performance" preference set to false for some reason?
Flags: needinfo?(scoobidiver)
(In reply to Boris Zbarsky (:bz) from comment #3)
> Scoobidiver, is there any way to tell whether the users seeing this have the
> "dom.enable_performance" preference set to false for some reason?
We don't have that in App Notes. The only way is to contact users that let their email addresses and ask them. It's a lot of work compared to fixing bug 866545 first and see how crash stats behave.
Flags: needinfo?(scoobidiver)
Well, let's start with the user from comment 2.  ;)  But yes, comment 4 describes my current general plan if there is no easy way to tell what prefs look like for crashing cases.
Flags: needinfo?(linux.user.since.2002)
(In reply to Boris Zbarsky (:bz) from comment #3)
> is there any way to tell whether the users seeing this have the
> "dom.enable_performance" preference set to false for some reason?

For one:

From https://wiki.mozilla.org/Security/Reviews/Firefox/NavigationTimingAPI
Under "Conclusions / Action Items"
* [dveditz] Point the Tor folks at the pref for disabling this feature (dom.enable_performance)
* [curtisk] talk to Sid about privacy
...

Granted, this may (or may not be) partial, potentially out-of-date information (for one, Sid's reply would be nice to know); however, it is still online without any pointers to anything newer.  As an aside, it can be hard to make an informed decision about every single entry in about:config, particularly if information proves scarce.  IIRC, took a week or so when I attempted that last.  However, for this pref, a quick google search for the pref name ("dom.enable_performance") returned the mozilla page in the top results (within the first few results on the first page and appearing above the fold).

Speculatively, users may have disabled it themselves or perhaps some addons are disabling it on behalf of users, likely including those Tor related as noted in the conclusions on that page.
Flags: needinfo?(linux.user.since.2002)
NO, you don't understand.  You said you can reproduce.  Do _you_ have this pref set?
Flags: needinfo?(linux.user.since.2002)
(In reply to linux.user.since.2002 from comment #1)
> If you restore a session with more tabs than the critical limit, the next
> tab opened will crash.

(In reply to linux.user.since.2002 from comment #2)
> Rather, it seems that restore of a session beyond the critical limit will
> also trigger this.

Regarding session restore, it seems there are some cases where the restore will trigger the crash and some cases where it will not, even if the number of tabs restored is well over the critical limit.  Again, this does not appear to be specific to session restore, as starting a new session and just opening tabs will crash firefox after a few dozen or so.
Flags: needinfo?(linux.user.since.2002)
(In reply to Boris Zbarsky (:bz) from comment #7)
> Do _you_ have this pref set?

To state explicitly, I have:
dom.enable_performance = false
There have been no crashes since 23.0a1/20130503 so it's fully fixed by the patch of bug 866545.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Scoobidiver from comment #10)
> There have been no crashes since 23.0a1/20130503 so it's fully fixed by the
> patch of bug 866545.
My bad. There are still a few ones on Mac: https://crash-stats.mozilla.com/report/list?signature=nsRefreshDriver%3A%3ATick%28long+long%2C+mozilla%3A%3ATimeStamp%29
Status: RESOLVED → REOPENED
Crash Signature: [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] [@ nsRefreshDriver::Tick(long, mozilla::TimeStamp) ] → [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] [@ nsRefreshDriver::Tick(long, mozilla::TimeStamp) ] [@ @0x0 | nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ]
OS: All → Mac OS X
Hardware: All → x86_64
Resolution: FIXED → ---
For what it's worth, those seem to have completely bogus stacks.
It's #4 browser crasher in 23.0a2 and #9 in 24.0a1 on Mac OS X.
Crash Signature: [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] [@ nsRefreshDriver::Tick(long, mozilla::TimeStamp) ] [@ @0x0 | nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] → [@ nsRefreshDriver::Tick(__int64, mozilla::TimeStamp)] [@ nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] [@ nsRefreshDriver::Tick(long, mozilla::TimeStamp) ] [@ @0x0 | nsRefreshDriver::Tick(long long, mozilla::TimeStamp) ] [@ jemalloc_crash | …
I hit this yesterday and I have the dom.enable_performance set to True.

Running 24.0a1 (2013-06-15) UX on a Mac.
There are two kinds of stack trace:
Frame 	Module 	Signature 	Source
0 		@0x400000001 	
1 	XUL 	nsRefreshDriver::Tick(long long, mozilla::TimeStamp) 	obj-firefox/x86_64/dist/include/nsAutoPtr.h
2 	XUL 	mozilla::RefreshDriverTimer::Tick() 	layout/base/nsRefreshDriver.cpp
3 	XUL 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp
4 	XUL 	nsTimerEvent::Run() 	xpcom/threads/nsTimerImpl.cpp
5 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
6 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp
7 	XUL 	nsBaseAppShell::NativeEventCallback() 	widget/xpwidgets/nsBaseAppShell.cpp
8 	XUL 	nsAppShell::ProcessGeckoEvents(void*) 	widget/cocoa/nsAppShell.mm

0 	XUL 	nsRefreshDriver::Tick(long long, mozilla::TimeStamp) 	obj-firefox/x86_64/dist/include/nsAutoPtr.h
1 	XUL 	mozilla::InactiveRefreshDriverTimer::TickOne() 	layout/base/nsRefreshDriver.cpp
2 	XUL 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp
3 	XUL 	nsTimerEvent::Run() 	xpcom/threads/nsTimerImpl.cpp
4 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
5 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp
6 	XUL 	nsBaseAppShell::NativeEventCallback() 	widget/xpwidgets/nsBaseAppShell.cpp
7 	XUL 	nsAppShell::ProcessGeckoEvents(void*) 	widget/cocoa/nsAppShell.mm
Blocks: 877097
Crash Signature: , mozilla::TimeStamp) ] [@ jemalloc_crash | libsystem_c.dylib@0x2d8f8 ] [@ jemalloc_crash | libsystem_c.dylib@0xa0789 ] → , mozilla::TimeStamp) ] [@ jemalloc_crash | libsystem_c.dylib@0x2d8f8 ] [@ jemalloc_crash | libsystem_c.dylib@0xa0789 ] [@ jemalloc_crash | arena_dalloc | libsystem_c.dylib@0x2d8f8 ]
Emailed the Layout team for help with assignee.

Scoobidiver, can you please help with info on how is this related to 877097 ? Thanks !
Flags: needinfo?(scoobidiver)
So, FWIW, the dom.enable_performance thing appears to be unrelated.  That was bug 866545, which had a reproducible test case which crashed on all platforms.  This appears to be Mac specific and running the test case in that bug does not trigger a crash anymore.  This crash signature is a generic signature for crashing when trying to call a refresh observer, so while this bug was originally filed around the same time as bug 866545 and probably covered that problem originally the crashes we're seeing today are probably from a different cause.
A comment says: "Facebook….I had just clicked on a picture to see detail (and this is the first ffox had crashed on the new beta)"

(In reply to bhavana bajaj [:bajaj] from comment #18) 
> Scoobidiver, can you please help with info on how is this related to 877097
Bug 877097's stack trace has two more frames than this one but the rest is the same. In addition, both show up around the same build.
Flags: needinfo?(scoobidiver)
spohl - do you agree that it makes sense to investigate this issue at the same time as bug 877097 given comment 20?
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Alex Keybl [:akeybl] from comment #21)
> spohl - do you agree that it makes sense to investigate this issue at the
> same time as bug 877097 given comment 20?

Yes. Bug 877097 is next on my list and I agree that it may be the same problem we're seeing here. I'll look into it.
Assignee: nobody → spohl.mozilla.bugs
Flags: needinfo?(spohl.mozilla.bugs)
Stephen, do we have any progress on your investigations there? This is still the #1 issue on Mac in Beta 23.
There is a chance that bug 868498 will fix bug 877097, which may just be the same bug as this one here. I just pushed the patches for bug 868498 to inbound. I will monitor the crash stats in the next few days. I'll also try and debug this some more.
Bug 868498 landed in the beta range pointed to, but landed on aurora already on 2013-07-17. Scoobidiver, you say "maybe" on the Aurora working build, could it be the one from the 18th instead where it started to work?
I have a feeling that this crash disappeared due to the landing of bug 895203 in beta and that it might still be present in aurora and nightly, same as bug 877097. Scoobidiver, is that something you can confirm or do we need to wait a few more days?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #26)
> Scoobidiver, you say "maybe" on the Aurora working build, could it be the one from the
> 18th instead where it started to work?
"Maybe" meant "too soon to conclude it's fixed in Aurora based on the low volume of Mac users" but in any case it is fully fixed earlier than 24.0a2/20130720. See bp-7e080d29-6e1d-4002-9a1f-6f1752130719.

(In reply to Stephen Pohl [:spohl] from comment #27)
> Scoobidiver, is that something you can confirm or do we need to wait a few more days?
It's fixed in Beta 7 but not sure in Aurora as there are no Mac builds for 24.0a2/20130721 (see ftp://ftp.mozilla.org/pub/firefox/nightly/2013-07-21-00-40-03-mozilla-aurora/) and 24.0a2/20130722 (see ftp://ftp.mozilla.org/pub/firefox/nightly/2013-07-22-00-40-04-mozilla-aurora/).
Updating tracking flags for FF23 based on comment 28.
Depends on: 895203
If I'm reading the crash stats right, this crash hasn't occurred since build 25.0a1 20130722030226, which seems to (roughly) coincide with the landing of patch part 2 in bug 877097. This has me a bit puzzled, but I think we might be able to close this bug here while we're adding additional safeguards in bug 877097 (see bug 877097 comment 39). Scoobidiver, what do you think?
Flags: needinfo?(scoobidiver)
(In reply to Stephen Pohl [:spohl] from comment #30)
> This has me a bit puzzled, but I think we might be able to close this bug here while
> we're adding additional safeguards in bug 877097 (see bug 877097 comment 39).
> Scoobidiver, what do you think?
Yes. Bug 877097 and this bug track the same issue.
No longer blocks: 877097
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Flags: needinfo?(scoobidiver)
Resolution: --- → DUPLICATE
Duplicate of bug: 877097
You need to log in before you can comment on or make changes to this bug.