crash @ RtlEnterCriticalSection | PR_Lock | nsThread::Dispatch | nsTimerImpl::PostTimerEvent

RESOLVED FIXED in Firefox 22

Status

()

Core
XPCOM
--
critical
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: Scoobidiver (away), Assigned: aklotz)

Tracking

({crash, topcrash})

18 Branch
mozilla24
x86
Windows XP
crash, topcrash
Points:
---

Firefox Tracking Flags

(firefox21 wontfix, firefox22+ fixed, firefox23+ verified, firefox24+ verified)

Details

(crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

6 years ago
It's #34 top browser crasher in 11.0, #31 in 12.0b4, #46 in 13.0a2 and #113 in 14.0a1.

On comment says on April 12th: "This crash happened after the latest mcrosoft update" bp-13f916c4-c46f-47ee-83d2-92cf22120412

Signature 	RtlEnterCriticalSection | PR_Lock | nsThread::Dispatch(nsIRunnable*, unsigned int) More Reports Search
UUID	3cc13d95-0344-4e93-854e-3840a2120412
Date Processed	2012-04-12 11:45:27
Uptime	98
Install Age	8.1 hours since version was first installed.
Install Time	2012-04-12 03:41:16
Product	Firefox
Version	13.0a2
Build ID	20120411042011
Release Channel	aurora
OS	Windows NT
OS Version	5.1.2600 Service Pack 2
Build Architecture	x86
Build Architecture Info	GenuineIntel family 15 model 2 stepping 4
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x30
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x002d, AdapterSubsysID: 001e10de, AdapterDriverVersion: 5.6.7.3
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers- 
EMCheckCompatibility	True	
Total Virtual Memory	2147352576
Available Virtual Memory	1935941632
System Memory Use Percentage	31
Available Page File	1744617472
Available Physical Memory	553472000

Frame 	Module 	Signature 	Source
0 	ntdll.dll 	RtlEnterCriticalSection 	
1 	nspr4.dll 	PR_Lock 	nsprpub/pr/src/threads/combined/prulock.c:233
2 	xul.dll 	nsThread::Dispatch 	xpcom/threads/nsThread.cpp:440
3 	xul.dll 	nsTimerImpl::PostTimerEvent 	xpcom/threads/nsTimerImpl.cpp:628
4 	xul.dll 	TimerThread::Run 	xpcom/threads/TimerThread.cpp:319
5 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
6 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:289
7 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
8 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:122
9 	msvcr100.dll 	_callthreadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:314
10 	msvcr100.dll 	_threadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:292
11 	kernel32.dll 	BaseThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=RtlEnterCriticalSection+|+PR_Lock+|+nsThread%3A%3ADispatch%28nsIRunnable*%2C+unsigned+int%29

Comment 1

6 years ago
Created attachment 660574 [details]
test audio file for crash testcase.

This is a test audio file for a testcase that I am uploading. There is nothing special about the audio file itself, it's only useful since it's short and can be looped quickly to help find an intermittent error.

Updated

6 years ago
Attachment #660574 - Attachment mime type: application/octet-stream → audio/ogg

Comment 2

6 years ago
Created attachment 660578 [details]
Crash testcase. loops audio file leadin to a crash.

I hope this is the right bug to attach this testcase to(the signature matches the crash I'm getting: https://crash-stats.mozilla.com/report/index/bp-b1c8c6b4-a521-4c87-8be9-051ec2120912)

The crash should occur soon after pressing "play" on the <audio> element. It will loop repeatedly until it crashes; sometimes this occurs quickly, sometimes it takes several minutes or more. This will crash on two computers I've tested, both running FF 15.0.1, another running the current Aurora build; one is under windows 7, the other windows XP. The crash occurs as the song ends and a "new" source is set. It only occurs when setting the src element after playing; other tests looping without setting source had no problem, nor did setting source repeatedly without playing cause a problem.

Updated

6 years ago
Attachment #660578 - Attachment mime type: text/plain → text/html
(Reporter)

Comment 3

6 years ago
It doesn't crash for me.

Comment 4

6 years ago
Interesting. How long did you let it run? I've had one going for ~20 minutes now and it hasn't crashed(though it did earlier several times); but on another machine it just crashed within probably less than a minute. I should also note that I've only tested the crash under single-core systems(one a laptop running Windows 7, another a vmware instance running Windows XP on top of Ubuntu 10.04, restricted to one core). Testing it now under a dual-core vmware instance.

Comment 5

6 years ago
The instance mentioned in my prior comment that had run ~20 minutes, crashed at ~35 minutes; so clearly, my earlier assertion that it should fail "soon" isn't always true(though so far it does seem to fail soon most of the time for me: after failing at ~35 minutes, it failed again several times in a row in just seconds). A dual-core WindowsXP instance under vmware crashed at around 38 minutes, then again at 14 minutes.
I have a user still experiencing this in 18, https://crash-stats.mozilla.com/report/index/ccdd4bbe-2f5c-4096-b506-2448d2130116, any progress or should we do 1:1?
Version: 11 Branch → 18 Branch

Comment 7

5 years ago
This signature is now entering the top 10 crashes on Aurora 21.

Comment 8

5 years ago
Ehsan, do you have any clues about this one? It appears that we're posting an event to a dead thread, but that would mean that we have a refcounting error (because TimerImpl::mEventTarget is a strong ref). Otherwise this means that we're dispatching from a dead nsTimerImpl?
(Assignee)

Updated

5 years ago
Assignee: nobody → aklotz
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> Ehsan, do you have any clues about this one? It appears that we're posting
> an event to a dead thread, but that would mean that we have a refcounting
> error (because TimerImpl::mEventTarget is a strong ref). Otherwise this
> means that we're dispatching from a dead nsTimerImpl?

NS_GetCurrentThread() can return null, right?  What if mEventTarget is just null?  (That doesn't seem that implausible given the crash address of 0x20.)

Comment 10

5 years ago
Yeah, I somehow missed that clue.

NS_GetCurrentThread should only be able to return null before nsThreadManager::Init or after nsThreadManager::Shutdown. So really mEventTarget should never have the opportunity to be null in normal operation. Perhaps somebody is trying to create a timer very early in startup?

So I guess we can wallpaper this by just null-checking mEventTarget in nsTimerImpl::PostTimerEvent, but we should probably also check it nsTimerImpl::InitCommon.
(Assignee)

Comment 11

5 years ago
Created attachment 724498 [details] [diff] [review]
null checking for mEventTarget

This theory makes perfect sense to me as well.
Attachment #724498 - Flags: review?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #10)
> Yeah, I somehow missed that clue.
> 
> NS_GetCurrentThread should only be able to return null before
> nsThreadManager::Init or after nsThreadManager::Shutdown. So really
> mEventTarget should never have the opportunity to be null in normal
> operation. Perhaps somebody is trying to create a timer very early in
> startup?

That seems implausible given the main thread stack in the crash report in comment 6, IINM:

0 	ntdll.dll 	NtWaitForSingleObject 	
1 	KERNELBASE.dll 	WaitForSingleObjectEx 	
2 	kernel32.dll 	WaitForSingleObjectExImplementation 	
3 	kernel32.dll 	WaitForSingleObject 	
4 	nspr4.dll 	PR_Wait 	nsprpub/pr/src/threads/prmon.c:152
5 	xul.dll 	mozilla::ReentrantMonitor::Wait 	obj-firefox/dist/include/mozilla/ReentrantMonitor.h:90
6 	xul.dll 	nsThreadStartupEvent::Wait 	xpcom/threads/nsThread.cpp:174
7 	xul.dll 	nsThread::Init 	xpcom/threads/nsThread.cpp:342
8 	xul.dll 	nsThreadManager::NewThread 	xpcom/threads/nsThreadManager.cpp:215
9 	xul.dll 	NS_NewThread_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:49
10 	xul.dll 	NS_NewNamedThread_P<12> 	obj-firefox/dist/include/nsThreadUtils.h:88
11 	xul.dll 	StateMachineTracker::EnsureGlobalStateMachine 	content/media/nsBuiltinDecoderStateMachine.cpp:267
12 	xul.dll 	nsBuiltinDecoderStateMachine::nsBuiltinDecoderStateMachine 	content/media/nsBuiltinDecoderStateMachine.cpp:413
13 	xul.dll 	nsOggDecoder::CreateStateMachine 	content/media/ogg/nsOggDecoder.cpp:13
14 	xul.dll 	nsBuiltinDecoder::Load 	content/media/nsBuiltinDecoder.cpp:328
15 	xul.dll 	nsHTMLMediaElement::FinishDecoderSetup 	content/html/content/src/nsHTMLMediaElement.cpp:2665
16 	xul.dll 	nsHTMLMediaElement::InitializeDecoderForChannel 	content/html/content/src/nsHTMLMediaElement.cpp:2640
17 	xul.dll 	nsHTMLMediaElement::MediaLoadListener::OnStartRequest 	content/html/content/src/nsHTMLMediaElement.cpp:345
18 	xul.dll 	mozilla::net::nsHttpChannel::CallOnStartRequest 	netwerk/protocol/http/nsHttpChannel.cpp:962
19 	xul.dll 	mozilla::net::nsHttpChannel::ContinueOnStartRequest2 	netwerk/protocol/http/nsHttpChannel.cpp:4859
20 	xul.dll 	mozilla::net::nsHttpChannel::OnStartRequest 	netwerk/protocol/http/nsHttpChannel.cpp:4832
21 	xul.dll 	nsInputStreamPump::OnStateStart 	netwerk/base/src/nsInputStreamPump.cpp:417
22 	xul.dll 	nsInputStreamReadyEvent::Run 	xpcom/io/nsStreamUtils.cpp:82
23 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:620
24 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:117
25 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:208
26 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:182
27 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
28 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:232
29 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:290
30 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3794
31 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3860
32 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3935
33 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:105

I think we should look at the crash dump and try to see what we're missing.  This _could_ be a memory corruption or something like that too.  I don't think we should be too quick to take the wallpaper patch.
(Assignee)

Comment 13

5 years ago
Fair enough, I'll keep digging.
(Assignee)

Comment 14

5 years ago
I might be completely out to lunch on this, but I offer some food for thought:

As we can see in the dump from comment 6, the main thread is in the midst of creating the state machine thread in StateMachineTracker::EnsureGlobalStateMachine. 

OTOH, there is a timer being set in nsBuiltinDecoderStateMachine::ScheduleStateMachine. nsITimer::SetTarget is being called with the return value from GetStateMachineThread. Since the state machine thread hasn't been created yet, GetStateMachineThread would be nullptr, thus feeding a mythical NULL mEventTarget into the timer.

There are numerous checks to ensure that this doesn't happen but they are all macros that are no-ops in non-debug builds AFAICT.

The one problem with this theory is that it's not clear to me who could be calling ScheduleStateMachine at this point. It looks like it's so early in the decoder initialization that this can't happen.
(In reply to Aaron Klotz [:aklotz] from comment #14)
> I might be completely out to lunch on this, but I offer some food for
> thought:
> 
> As we can see in the dump from comment 6, the main thread is in the midst of
> creating the state machine thread in
> StateMachineTracker::EnsureGlobalStateMachine. 
> 
> OTOH, there is a timer being set in
> nsBuiltinDecoderStateMachine::ScheduleStateMachine. nsITimer::SetTarget is
> being called with the return value from GetStateMachineThread. Since the
> state machine thread hasn't been created yet, GetStateMachineThread would be
> nullptr, thus feeding a mythical NULL mEventTarget into the timer.
> 
> There are numerous checks to ensure that this doesn't happen but they are
> all macros that are no-ops in non-debug builds AFAICT.
> 
> The one problem with this theory is that it's not clear to me who could be
> calling ScheduleStateMachine at this point. It looks like it's so early in
> the decoder initialization that this can't happen.

With a quick glance, you might be on to something here.  I've looked a bit at the guts of our decoder infrastructure before, but I wouldn't dare say that I understand much of it.  Chris however should have a good grasp on this code.

Chris, do you see how this can happen?  I'd appreciate if you can either validate or discredit our theories here.
Flags: needinfo?(cpearce)

Comment 16

5 years ago
Comment on attachment 724498 [details] [diff] [review]
null checking for mEventTarget

I don't allow NS_ENSURE_* in my modules, and in this case I think it should assert, so please use

if (!mEventTarget) {
  NS_ERROR("...";
  return NS_ERROR_NOT_INITIALIZED;
}
Attachment #724498 - Flags: review?(benjamin) → review-
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #15)
> (In reply to Aaron Klotz [:aklotz] from comment #14)
> > I might be completely out to lunch on this, but I offer some food for
> > thought:
> > 
> > As we can see in the dump from comment 6, the main thread is in the midst of
> > creating the state machine thread in
> > StateMachineTracker::EnsureGlobalStateMachine. 
> > 
> > OTOH, there is a timer being set in
> > nsBuiltinDecoderStateMachine::ScheduleStateMachine. nsITimer::SetTarget is
> > being called with the return value from GetStateMachineThread. Since the
> > state machine thread hasn't been created yet, GetStateMachineThread would be
> > nullptr, thus feeding a mythical NULL mEventTarget into the timer.
> > 
> > There are numerous checks to ensure that this doesn't happen but they are
> > all macros that are no-ops in non-debug builds AFAICT.

We can make some of the checks non-no-ops in non-debug builds if it would help to narrow the cause of this crash.

> > 
> > The one problem with this theory is that it's not clear to me who could be
> > calling ScheduleStateMachine at this point. It looks like it's so early in
> > the decoder initialization that this can't happen.

I agree, I don't see how ScheduleStateMachine() could be called this early on during a load.

We only publish a reference to the decoder and thus make it/ScheduleStateMachine() callable by the HTMLMediaElement later on in nsHTMLMediaElement::FinishDecoderSetup(), after MediaDecoder/nsBuiltinDecoder::Load() has finished, but in the stack above we're still working our way through Load().

And the only code that calls ScheduleStateMachine() should either be run on the state machine thread, or be initiated by calls to the MediaDecoder reference we publish after Load() has finished.

We're also quite careful to cancel the timer before we destroy the state machine and state machine thread, and are careful to drop the global reference to the thread before destroying the thread and so forth.

So I don't see how this could happen, and if it could happen I'd expect our tests to catch it.

Having said that, we've been stung enough times by this area of code that I won't rule it out. ;)
Flags: needinfo?(cpearce)

Updated

5 years ago
Blocks: 842004
(Reporter)

Updated

5 years ago
No longer blocks: 842004
Duplicate of this bug: 842004
(Reporter)

Updated

5 years ago
Crash Signature: [@ RtlEnterCriticalSection | PR_Lock | nsThread::Dispatch(nsIRunnable*, unsigned int)] → [@ RtlEnterCriticalSection | PR_Lock | nsThread::Dispatch(nsIRunnable*, unsigned int)] [@ RtlEnterCriticalSection | PR_Lock | nsThread::GetObserver(nsIThreadObserver**) ]
(Assignee)

Comment 19

5 years ago
Created attachment 740903 [details] [diff] [review]
null checking for mEventTarget, v2

Incorporated the changes requested by Benjamin.
Attachment #724498 - Attachment is obsolete: true
Attachment #740903 - Flags: review?(benjamin)

Updated

5 years ago
Attachment #740903 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/f8077e8edd44
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 23

5 years ago
It's #8 top browser crasher in 20.0.1, #7 in 21.0b3, #10 in 22.0a2, and #76 in 23.0a1.
(Assignee)

Comment 24

5 years ago
I've done some more digging as to what is going on and why many of these crashes are at address 0x20:

The mLock field is a mozilla::Mutex. That object uses a PRLock, which is allocated off the heap. The CRITICAL_SECTION structure is located at offset 0x1c from the beginning of the PRLock structure, and EnterCriticalSection first attempts to modify CRITICAL_SECTION::LockCount, which is at offset 4 from the beginning of the CS, hence our offset of 0x20.

Since mozilla::Mutex sets the PRLock pointer to 0 upon destruction, we're looking at a use-after-free situation with respect to the nsThread object.
Can it not be a null dereference?
(Assignee)

Comment 26

5 years ago
It is a dereference of a null PRLock pointer contained within the nsThread, but pointer to the nsThread object itself would be non-null.
Bug 761987 is a long-standing Android crash bug with thousands of failure reports. Some of those logs appear to be mis-filed and contain stacks very similar to this bug. 

For example: 

https://tbpl.mozilla.org/php/getParsedLog.php?id=22968490&tree=Mozilla-Inbound#error0 contains

Crash reason:  SIGSEGV
Crash address: 0x3c

Thread 19 (crashed)
 0  libnss3.so!PR_Lock [ptsynch.c : 184 + 0x2]
     r4 = 0x00000000    r5 = 0x603ffdec    r6 = 0x8007000e    r7 = 0x603ffdd8
     r8 = 0x0d43053f    r9 = 0x00000094   r10 = 0x603ffe9f    fp = 0x5f81d9cc
     sp = 0x603ffdd0    lr = 0x5f3eb91d    pc = 0x5f3eb91e
    Found by: given as instruction pointer in context
 1  libxul.so!nsThread::GetObserver(nsIThreadObserver**) [Mutex.h : 74 + 0x3]
     sp = 0x603ffdd8    pc = 0x63354023
    Found by: stack scanning
 2  libxul.so!nsThread::PutEvent(nsIRunnable*) [nsThread.h : 70 + 0x7]
     sp = 0x603ffde8    pc = 0x633547ab
    Found by: stack scanning
 3  libxul.so!nsTimerImpl::PostTimerEvent() [nsTimerImpl.cpp:1d3b60ab634a : 676 + 0x9]
     sp = 0x603ffe00    pc = 0x63356993
    Found by: stack scanning
 4  libxul.so!TimerThread::Run() [TimerThread.cpp : 232 + 0x5]
     sp = 0x603ffe18    pc = 0x63356e8b
    Found by: stack scanning
 5  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp : 627 + 0x5]
     sp = 0x603ffe50    pc = 0x63354495
    Found by: stack scanning
 6  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp : 238 + 0x9]
     sp = 0x603ffe98    pc = 0x6333274b
    Found by: stack scanning
 7  libxul.so!nsTransportEventSinkProxy::QueryInterface(nsID const&, void**) [nsTransportUtils.cpp : 92 + 0x11]
     sp = 0x603ffe9c    pc = 0x629d14dd
    Found by: stack scanning
 8  libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp : 265 + 0x7]
     sp = 0x603ffeb0    pc = 0x63354261
    Found by: stack scanning

I'm not sure if this is the Android expression of the problem under investigation here or a slightly different issue.
Jumped to #5 on Fx21 topcrash list
(Reporter)

Comment 29

5 years ago
Here are interesting correlations in 21.0 on May 24:
  RtlEnterCriticalSection | PR_Lock | nsThread::Dispatch(nsIRunnable*, unsigned int)|EXCEPTION_ACCESS_VIOLATION_WRITE (641 crashes)
     79% (504/641) vs.   1% (1014/69132) msmpeg2adec.dll (Microsoft DTV-DVD Audio Decoder)
          0% (0/641) vs.   0% (3/69132) 11.0.6001.7000
         79% (504/641) vs.   1% (1011/69132) 6.1.7140.0
     98% (628/641) vs.  21% (14805/69132) mfreadwrite.dll (Media Foundation ReadWrite)
          3% (19/641) vs.   2% (1430/69132) 12.0.7600.16385
          5% (32/641) vs.   2% (1640/69132) 12.0.7600.16597
         89% (571/641) vs.  15% (10331/69132) 12.0.7601.17514
          1% (6/641) vs.   0% (285/69132) 12.0.7601.17596
     98% (628/641) vs.  21% (14820/69132) mf.dll (Media Foundation)
          3% (19/641) vs.   2% (1407/69132) 12.0.7600.16385
          5% (32/641) vs.   2% (1630/69132) 12.0.7600.16597
         90% (576/641) vs.  15% (10529/69132) 12.0.7601.17514
          0% (1/641) vs.   0% (50/69132) 12.0.7601.21769
     98% (628/641) vs.  22% (14897/69132) mfplat.dll (Media Foundation Platform)
         98% (628/641) vs.  20% (13774/69132) 12.0.7600.16385
(Assignee)

Comment 30

5 years ago
I've been able to consistently repro this issue using my old Windows XP box when forcing it to boot with a single core. I believe that I have identified the problem and am working on a fix.
(Assignee)

Comment 32

5 years ago
Created attachment 755600 [details] [diff] [review]
nsTimerEvent fix

This was quite the adventure (and exactly the kind of story that I expect to hear from our stability engineer candidates)! I've been able to stamp this out in my repro envrionment. This is _really_ subtle, so please bear with me:

The problem is that nsTimerEvent releases its reference to its timer object too soon. The event has successfully dispatched to the target thread, however in this case the timer thread has not yet returned out of Dispatch() when the nsTimerEvent runs on the target thread.

In the case of MediaDecoderStateMachine, when the timer fires during its shutdown sequence, it triggers the dispatching of some nsRunnables that eventually result in the MediaDecoderStateMachine's destruction on the main thread. As part of its destruction, MediaDecoderStateMachine releases its mTimer object. Also note that the Media State thread shared by all instances of MediaDecoderStateMachine might also be released at this point.

Since nsTimerEvent releases its reference to the same timer when nsTimerEvent::Run() completes, the timer is destroyed and its reference to mEventTarget is also released. Unfortunately mEventTarget is the Media State thread, which was also released above. The timer's efforts to Dispatch() (or finish Dispatch()ing) to that thread will now crash, as the mEventTarget is non-null but its object has been destroyed.

I believe that the fix is to ensure that nsTimerEvent holds a reference to the timer until the nsTimerEvent itself is destroyed, ensuring that the timer and its event target remain valid until dispatching has completely finished.
Attachment #755600 - Flags: review?(benjamin)
Nice catch Aaron!

Comment 34

5 years ago
Comment on attachment 755600 [details] [diff] [review]
nsTimerEvent fix

I'd like a second-review on this.
Attachment #755600 - Flags: review?(ehsan)
Attachment #755600 - Flags: review?(benjamin)
Attachment #755600 - Flags: review+
Comment on attachment 755600 [details] [diff] [review]
nsTimerEvent fix

Review of attachment 755600 [details] [diff] [review]:
-----------------------------------------------------------------

Great analysis!
Attachment #755600 - Flags: review?(ehsan) → review+
(Assignee)

Comment 37

5 years ago
Comment on attachment 755600 [details] [diff] [review]
nsTimerEvent fix

[Approval Request Comment]
Regression caused by (bug #): Unknown, affected code showed revision 1 in hg, so it's pretty old. This bug relies on very subtle thread timings that have been occurring more frequently as of late.
User impact if declined: Browser crashes, particularly on slower machines. #5 topcrasher on release, #10 on beta, #11 on aurora. Crash-stats comments suggest that this often happens while listening to music on pandora.com.
Testing completed (on m-c, etc.): Ran patched nightly and release try builds on test machine that could reproduce issue, no longer crashes
Risk to taking this patch (and alternatives if risky): Very low, a simple change
String or IDL/UUID changes made by this patch: None
Attachment #755600 - Flags: approval-mozilla-release?
Attachment #755600 - Flags: approval-mozilla-beta?
Attachment #755600 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/c51e0af2a043
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla23 → mozilla24

Updated

5 years ago
status-firefox21: --- → wontfix
status-firefox22: --- → affected
status-firefox23: --- → affected
status-firefox24: --- → fixed
tracking-firefox22: --- → +
tracking-firefox23: --- → +
tracking-firefox24: --- → +

Updated

5 years ago
Attachment #755600 - Flags: approval-mozilla-release?
Attachment #755600 - Flags: approval-mozilla-release-
Attachment #755600 - Flags: approval-mozilla-beta?
Attachment #755600 - Flags: approval-mozilla-beta+
Attachment #755600 - Flags: approval-mozilla-aurora?
Attachment #755600 - Flags: approval-mozilla-aurora+

Updated

5 years ago
Duplicate of this bug: 874898

Updated

5 years ago
Duplicate of this bug: 855069
Backed out from Aurora due to frequent Windows XP mochitest leaks. No idea why m-c or beta don't seem to be leaking.
https://hg.mozilla.org/releases/mozilla-aurora/rev/5d2958ada047

https://tbpl.mozilla.org/php/getParsedLog.php?id=23747986&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=23744096&tree=Mozilla-Aurora
status-firefox23: fixed → affected
(Assignee)

Updated

5 years ago
Depends on: 885095
Comment on attachment 755600 [details] [diff] [review]
nsTimerEvent fix

Missed the Aurora train. If this still needs to land on Fx23, please request beta approval for whatever patch needs to eventually land.

(Clearing the old approval flags so this doesn't turn up in my queries in the mean time)
Attachment #755600 - Flags: approval-mozilla-beta+
Attachment #755600 - Flags: approval-mozilla-aurora+

Comment 44

5 years ago
Ugh, wait, this has landed in 22 and 24 but was backed out from 23?
Correct
Can we get a Beta uplift nomination?
Flags: needinfo?(aklotz)
Discussed with Aaron in IRC and since bug 885095 has been filed to track the intermittent leak in mochitests that was uncovered by this bug fix we do not believe this bug to be the cause so must take it on branch and track investigation of the leak in bug 885095.
(Assignee)

Comment 48

5 years ago
Comment on attachment 755600 [details] [diff] [review]
nsTimerEvent fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Unknown, affected code showed revision 1 in hg, so it's pretty old. This bug relies on very subtle thread timings that have been occurring more often lately.
User impact if declined: Browser crashes, particularly on slower machines. Previously a #5 topcrasher on Firefox 21 release. Currently #19 on beta. Crash-stats comments suggest that this often happens while listening to music on pandora.com.
Testing completed (on m-c, etc.): Currently landed on nightly, aurora, and release, eliminating this crash on those channels.
Risk to taking this patch (and alternatives if risky): Low. There is a known intermittent leak being tracked in bug 885095 that occurs after landing this patch. As commented in that bug, this patch reveals those symptoms but is not actually the cause.
String or IDL/UUID changes made by this patch: None.
Attachment #755600 - Flags: approval-mozilla-beta?
(Assignee)

Comment 49

5 years ago
Clearing needinfo.
Flags: needinfo?(aklotz)
Attachment #755600 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/b75aa2e0a5a8

Here goes...
status-firefox23: affected → fixed
Keywords: verifyme
Verified on Firefox 23 beta 5 (build ID: 20130711122148).
No crash occured in more than one hour using the attached testcases.
status-firefox23: fixed → verified
Keywords: verifyme
QA Contact: cornel.ionce
Using the attached testcases no crash occured for more than 90 minutes.
Verified on Firefox 24 beta 4 (build ID: 20130819170952)
status-firefox24: fixed → verified
You need to log in before you can comment on or make changes to this bug.