Closed Bug 720679 Opened 12 years ago Closed 12 years ago

Crash @ mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts while closing Firefox

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox11 + wontfix
firefox12 + fixed
firefox13 --- fixed
firefox-esr10 - wontfix

People

(Reporter: scoobidiver, Assigned: bent.mozilla)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [qa-])

Crash Data

Attachments

(1 file)

It's #83 top crasher in 12.0a1.

It's a new crash signature that first appeared with 32-bit builds in 12.0a1/20120119.
The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f4049f65efc6&tochange=58e933465c36
It's likely a regression from bug 718100.

According to comments, it occurs while closing Firefox.

Signature 	mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts(JSContext*) More Reports Search
UUID	6d879672-fce8-4ed9-8036-8731b2120124
Date Processed	2012-01-24 04:54:45
Uptime	49989
Last Crash	6.9 weeks before submission
Install Age	21.0 hours since version was first installed.
Install Time	2012-01-23 07:50:45
Product	Firefox
Version	12.0a1
Build ID	20120122031050
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 22 stepping 1
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x2e32, AdapterSubsysID: 31031565, AdapterDriverVersion: 8.15.10.2202
D2D? D2D+
DWrite? DWrite+
D3D10 Layers? D3D10 Layers+
EMCheckCompatibility	True

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts 	dom/workers/WorkerPrivate.cpp:3133
1 	xul.dll 	mozilla::dom::workers::WorkerPrivate::NotifyFeatures 	dom/workers/WorkerPrivate.cpp:3104
2 	xul.dll 	mozilla::dom::workers::WorkerPrivate::NotifyInternal 	dom/workers/WorkerPrivate.cpp:3298
3 	xul.dll 	`anonymous namespace'::NotifyRunnable::WorkerRun 	dom/workers/WorkerPrivate.cpp:942
4 	xul.dll 	mozilla::dom::workers::WorkerRunnable::Run 	dom/workers/WorkerPrivate.cpp:1709
5 	xul.dll 	mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnables 	dom/workers/WorkerPrivate.cpp:2873
6 	xul.dll 	mozilla::dom::workers::WorkerPrivate::OperationCallback 	dom/workers/WorkerPrivate.cpp:2731
7 	xul.dll 	`anonymous namespace'::OperationCallback 	dom/workers/RuntimeService.cpp:275
8 	mozjs.dll 	js_InvokeOperationCallback 	js/src/jscntxt.cpp:1252
9 	mozjs.dll 	js_HandleExecutionInterrupt 	js/src/jscntxt.cpp:1260
10 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:1917
11 	mozjs.dll 	js::types::TypeScript::SetThis 	js/src/jsinferinlines.h:667
12 	mozjs.dll 	js::ExecuteKernel 	js/src/jsinterp.cpp:711
13 	mozjs.dll 	js::Execute 	js/src/jsinterp.cpp:752
14 	mozjs.dll 	EvaluateUCScriptForPrincipalsCommon 	js/src/jsapi.cpp:5326
15 	mozjs.dll 	JS_EvaluateUCScriptForPrincipals 	js/src/jsapi.cpp:5337
16 	xul.dll 	mozilla::dom::workers::WorkerPrivate::RunExpiredTimeouts 	dom/workers/WorkerPrivate.cpp:3668
17 	xul.dll 	`anonymous namespace'::TimerRunnable::WorkerRun 	dom/workers/WorkerPrivate.cpp:1227
18 	xul.dll 	mozilla::dom::workers::WorkerRunnable::Run 	dom/workers/WorkerPrivate.cpp:1709
19 	xul.dll 	mozilla::dom::workers::WorkerPrivate::DoRunLoop 	dom/workers/WorkerPrivate.cpp:2638
20 	xul.dll 	`anonymous namespace'::WorkerThreadRunnable::Run 	dom/workers/RuntimeService.cpp:361
21 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
22 	xul.dll 	nsThreadStartupEvent::`vector deleting destructor' 	
23 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:289
24 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3Aworkers%3A%3AWorkerPrivate%3A%3ACancelAllTimeouts%28JSContext*%29
Assignee: nobody → bent.mozilla
Following the signature link you posted above

https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3Aworkers%3A%3AWorkerPrivate%3A%3ACancelAllTimeouts%28JSContext*%29

I see plenty of crashes here as early as Firefox 8. A bug, no doubt, but not related to recent GC changes in bug 718100.
No longer blocks: 718100
(In reply to ben turner [:bent] from comment #1)
> I see plenty of crashes here as early as Firefox 8.
Yes but there's a spike in crashes starting in 12.0a1/20120119:
https://crash-stats.mozilla.com/report/list?version=Firefox%3A12.0a1&query_search=signature&query_type=contains&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&signature=mozilla%3A%3Adom%3A%3Aworkers%3A%3AWorkerPrivate%3A%3ACancelAllTimeouts%28JSContext*%29
It's not caused by new Nightly users according to the following comment:
"applying nightly update"
On trunk, this in recent times first appeared on 2012-01-19 and continuously rose since then. This may be something new or may be Nightly testers increasingly using stuff that release users are also stumbling over. Hard to say without someone knowing the code actually digging into it. 

In any case, something is going wrong there, and with something like workers, which are probably not in as intense use on the web yet as older web features, this might warrant a look, as we really want workers to work (no pun intended) out there.
It's currently #19 top browser crasher in 12.0a1 over the last week. The cause of the spike needs to be fixed.
There are 3 crashes per day ... this is hardly some explosive topcrasher.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #5)
> There are 3 crashes per day
I count 5 crashes per buildID over the last week with currently no crashes in 12.0a1/20120130.
Let's track when this is deemed a topcrash on 12.0a2.
It's #36 top browser crasher in 11.0b2, #32 in 12.0a2 and #31 in 13.0a1.

Here are correlations per add-on in 11.0 on Feb 15:
     94% (85/90) vs.   1% (344/32012) support@lastpass.com (LastPass Password Manager, https://addons.mozilla.org/addon/8542)
     44% (40/90) vs.   9% (2742/32012) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
     24% (22/90) vs.   2% (586/32012) {D4DD63FA-01E4-46a7-B6B1-EDAB7D6AD389} (Download Statusbar, https://addons.mozilla.org/addon/26)
     21% (19/90) vs.   1% (369/32012) foxmarks@kei.com (Xmarks (formerly Foxmarks), https://addons.mozilla.org/addon/2410)
     17% (15/90) vs.   2% (553/32012) {DDC359D1-844A-42a7-9AA1-88A850A938A8} (DownThemAll!, https://addons.mozilla.org/addon/201)
     14% (13/90) vs.   1% (480/32012) {a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7} (WOT, https://addons.mozilla.org/addon/3456)
     33% (30/90) vs.  22% (7025/32012) firefox-hotfix@mozilla.org
     13% (12/90) vs.   2% (722/32012) personas@christopher.beard (Personas, https://addons.mozilla.org/addon/10900)
     12% (11/90) vs.   1% (475/32012) compatibility@addons.mozilla.org
Keywords: regressiontopcrash
Version: 12 Branch → unspecified
Crash Signature: [@ mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts(JSContext*)] → [@ mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts(JSContext*)] [@ mozilla::dom::workers::WorkerPrivate::NotifyFeatures]
OS: Windows 7 → All
(In reply to Scoobidiver from comment #8)
> It's #36 top browser crasher in 11.0b2, #32 in 12.0a2 and #31 in 13.0a1.
> 
> Here are correlations per add-on in 11.0 on Feb 15:
>      94% (85/90) vs.   1% (344/32012) support@lastpass.com (LastPass
> Password Manager, https://addons.mozilla.org/addon/8542)

Based upon Scoobidiver's correlations, let's move forward with trying to reproduce. Adding qawanted.
Keywords: qawanted
Two of the comments from the Mac signature mention something related to "The updater crashed FF - happened on the last one too"

"installed donottrack plus addon and rebooted"

I will try the second scenario and report back.
Marcia, I suspect this has something to do with shutdown while having the lastpass extension installed.
So this is some volume regression that happened in 11. In the past week we have 372 on 11b2, 278 on 11b3. I looked back at levels for 10 betas and we have maybe 1-2 crashes. On 10.0.1 and 10.0.2 the volume is really low. 

Anything more we can pull from the crash reports that might help?
Keywords: regression
I took a first pass at reproducing this using 11.0b3 with Last Pass 1.80.0. So far I haven't been able to reproduce it. I am going to look at the addon correlations/correlations by version again as well as read the comments to see what else I can glean.
I looked at the 63 Mac reports to see if I could get any information from those reports. Some of those users had Version 1.90.0 of Last Pass. So far no luck on Mac either after multiple shutdowns and restarts.
I've been getting crashes on shutdown pretty consistently lately on my home Windows machine. https://crash-stats.mozilla.com/report/index/5ffb8990-0b10-4f89-995a-db6802120222 is one. I also have LastPass installed. Not sure what version though.

I think it started for me when I switched from the Beta to Aurora channel on my home desktop within the last 2 or 3 weeks. I believe this coincided with an extensions update refresh and I also believe LastPass was upgraded (by me) around that time because the old version was marked as incompatible. I can't recall exactly what happened.

What do you want from me? I could probably install a debug build and get a full dump from Visual Studio if it would help.
(In reply to Gregory Szorc [:gps] from comment #15)
> I think it started for me when I switched from the Beta to Aurora channel

How did you make this switch?

> I could probably install a debug build and get a full dump from 
> Visual Studio if it would help.

It certainly would not hurt. It would give us more visibility into the issue.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #16)
> (In reply to Gregory Szorc [:gps] from comment #15)
> > I think it started for me when I switched from the Beta to Aurora channel
> 
> How did you make this switch?

I installed an update through the about page. I'm pretty sure I've seen it crash on update shutdown 3 or 4 times in the past few weeks when installing Aurora updates.
But how did you change from using Beta to using Aurora? There isn't a channel switcher in the About dialog anymore (AFAIK).

* Did you uninstall Beta and install Aurora?
* Did you install Aurora on top of Beta?
* Did you install Aurora to it's own folder and start with the same Beta profile?
* Did you install Aurora to it's own folder and start with a new profile?
* Did you edit your channel-prefs.js file to change "beta" to "aurora" and install an update?
I downloaded an Aurora installer, installed it, then started launching Aurora instead of Beta. I used the same profile. AFAIK I haven't opened the profile with Beta since I made the switch.
Okay, thanks, that helps.

Curious, your crash report seems to indicate LastPass v1.90. However, the latest version on AMO is 1.80. Where and how did you update LastPass?
So far I've not been able to reproduce this so it there might be other variables at play than just a simple software update with Lastpass installed:

1) Installed 11.0b4 and Lastpass 1.80 from AMO
2) Installed 2012-02-23 Aurora and started with the same profile
3) Downloaded and installed LastPass 1.90
4) Check for and install updates from About dialog
5) Pave over Aurora with 2012-02-22 and repeat step 4
6) Pave over Aurora with 2012-02-21 and repeat step 4
7) Pave over Aurora with 2012-02-20 and repeat step 4

No crashes yet...

I expect at this point it would be helpful to have some debug information.
I'd also like to see what Ben Turner advises.
I just installed 2012-02-20's Aurora then updated through about and didn't experience a crash. I'll keep on it.

FWIW, all of my crashes are:

https://crash-stats.mozilla.com/report/index/bp-7e6d3a15-49ec-4f3b-9475-3da202120224
https://crash-stats.mozilla.com/report/index/bp-5ffb8990-0b10-4f89-995a-db6802120222
https://crash-stats.mozilla.com/report/index/bp-acd5be45-dbea-40db-a32a-9d4a52120219

You'll notice I was on LastPass 1.80 in the first/oldest one.
Got one again today: https://crash-stats.mozilla.com/report/index/bp-4376d240-1145-4075-9580-e43dc2120228

However, I think the crash reporter handled the exception before Windows prompted me to open Visual Studio to debug it. If I disable crash reporter via application.ini, will the crash go unhandled causing the Visual Studio prompt to appear? Or, do I need to launch the executable from within VS?
I have seen some crash report with LP 1.80 and some with 1.90. The latest correlations still show a high correlation to Last Pass but I still have not been able to reproduce it.

mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts(JSContext*)|EXCEPTION_ACCESS_VIOLATION_READ (12 crashes)
    100% (12/12) vs.   3% (53/1635) support@lastpass.com (LastPass Password Manager, https://addons.mozilla.org/addon/8542)
(In reply to Gregory Szorc [:gps] from comment #24)
> Got one again today:
> https://crash-stats.mozilla.com/report/index/bp-4376d240-1145-4075-9580-
> e43dc2120228
> 
> However, I think the crash reporter handled the exception before Windows
> prompted me to open Visual Studio to debug it. If I disable crash reporter
> via application.ini, will the crash go unhandled causing the Visual Studio
> prompt to appear? Or, do I need to launch the executable from within VS?

Benjamin may know how to debug in this instance. Including him here.
You can set MOZ_CRASHREPORTER_DISABLE=1 in your environment before launching Firefox to disable the Mozilla crash reporter and attach directly using the MSVC JIT-debugging dialog. Or you can attach to a running Firefox instance using "Attach to Process" in the MSVC menu.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #27)
> You can set MOZ_CRASHREPORTER_DISABLE=1 in your environment before launching
> Firefox to disable the Mozilla crash reporter and attach directly using the
> MSVC JIT-debugging dialog. Or you can attach to a running Firefox instance
> using "Attach to Process" in the MSVC menu.

Thanks Benjamin!

gps - can you attempt to reproduce with MOZ_CRASHREPORTER_DISABLE=1 next? Thanks!
I've been waiting for another crash to occur.

I actually tried with the env variable set and experienced a crash on restart. But, the expected "this program has crashed - would you like to debug" prompt didn't appear. I now have VS attached to a running process, which will hopefully detect the memory access violation as soon as it occurs. I'll give that a few more days before building and running the tree myself.
Crash Signature: [@ mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts(JSContext*)] [@ mozilla::dom::workers::WorkerPrivate::NotifyFeatures] → [@ mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts(JSContext*)] [@ mozilla::dom::workers::WorkerPrivate::NotifyFeatures] [@ mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts ]
I finally got Visual Studio to capture a crash!

I'm running 11.0b3 (bb27964243e7). Visual Studio has downloaded symbols from the symbol server all right. Minidump file saved. Now what do I do?
Bring the machine into the office and sit down with bent?
The machine is a desktop, so bringing to the office is out of question.

However, I created a dump with full heap that anybody with Visual Studio should be able to load. I have that 800MB dump on a portable HD. Since I'm sure the heap contains sensitive information, I don't want to post it unencrypted. So, I need someone specific to speak up [to receive the decryption key].

Also, I wasn't catching thrown exceptions, so in the dump the stack has already been unwound to a core Windows DLL. Therefore, I'm not sure how useful the dump will be. I have since set up VS to break on throw exceptions, so hopefully I'll see this again in the next few days and will have the actual frames at the time of the NULL access.
Version: unspecified → 11 Branch
Gregory: I would suggest connecting with bent as he would be the best person to try to debug this problem and receive the decryption key.
I did finally hit this crash on Mac on my lab machine - https://crash-stats.mozilla.com/report/index/bp-c77d982d-0fab-432d-b987-b2fec2120319 - but I have not been able to reproduce it yet. Right now someone working with Gregory to help him debug the dump he has would be our best chance moving forward.
I have the dump on a portable HD. I usually work from the SF office. I don't have the drive with me today, however.

Also, I haven't seen any crashes for ~10 days now. I noticed that LastPass was upgraded. They did fix a zombie compartment in the newest version. Correlation == causation?
(In reply to Gregory Szorc [:gps] from comment #36)
> I have the dump on a portable HD. I usually work from the SF office. I don't
> have the drive with me today, however.

I've asked Ben to take a look at your dump.

> Also, I haven't seen any crashes for ~10 days now. I noticed that LastPass
> was upgraded. They did fix a zombie compartment in the newest version.
> Correlation == causation?

We should be on the lookout for this signature dropping off if we think LastPass may have fixed the issue.
https://crash-stats.mozilla.com/report/list?signature=mozilla::dom::workers::WorkerPrivate::CancelAllTimeouts%28JSContext*%29 still has a fair number of crashes across all versions, but very few on current trunk or Aurora.

The other two signatures are much lower in volume due to the fact they are the Mac and Linux signatures.
(In reply to Alex Keybl [:akeybl] from comment #37)
> We should be on the lookout for this signature dropping off if we think
> LastPass may have fixed the issue.
It hasn't fixed it based on correlations per extension in 11.0:
     97% (36/37) vs.   1% (82/13377) support@lastpass.com (LastPass Password Manager, https://addons.mozilla.org/addon/8542)
          3% (1/37) vs.   0% (1/13377) 1.80.0
         46% (17/37) vs.   0% (42/13377) 1.90.4 (version on AMO)
         49% (18/37) vs.   0% (39/13377) 1.90.6
See also https://addons.mozilla.org/firefox/addon/lastpass-password-manager/versions/
It's #48 on FF12b1 and #45 on FF11. Back before we shipped 11, it was much higher volume. Any way we can verify the crashes are happening with an older version of LastPass? If so, then we can remove the tracking flag.
Booya! I caught this exception just now. And, I have a heap dump before the stack unwound!

js::Execute reveals the JS being executed is chrome://lastpass/content/lpctypesworker.js. I don't know enough about the internals to find any more information. I should be in SF again tomorrow for a heap dump transfer.
Target Milestone: --- → mozilla13
Version: 11 Branch → Trunk
Greg's dump helped, I know what's going on. Fix is not too tricky, will attach patch as soon as I can test it.
Status: NEW → ASSIGNED
Attached patch Patch. v1Splinter Review
This fixes it. Basically RunExpiredTimeouts doesn't do anything if called recursively, and CancelAllTimeouts wasn't expecting to be called with that on the stack.

Either sicking or khuey can review, no need for two :)
Attachment #609558 - Flags: review?(khuey)
Attachment #609558 - Flags: review?(jonas)
Is there anything more QA can do to serve qawanted? We still don't have a reproducible case to verify any fix, afaik.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #44)
> Is there anything more QA can do to serve qawanted? We still don't have a
> reproducible case to verify any fix, afaik.

Doesn't sound like it. Fortunately I've included an automated test!
Removing qawanted as per comment 46. Please re-add if there is something specific QA needs to do here.
Keywords: qawanted
Target Milestone: mozilla13 → mozilla14
https://hg.mozilla.org/mozilla-central/rev/c346941eebda
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Ben - what's your feeling on uplifting this to Beta 12 ahead of our next beta (goes to build next Tuesday 4/3)?
Comment on attachment 609558 [details] [diff] [review]
Patch. v1

[Approval Request Comment]
Regression caused by (bug #): new dom workers
User impact if declined: Crashes on shutdown with some lastpass versions
Testing completed (on m-c, etc.): m-c, has mochitest
Risk to taking this patch (and alternatives if risky): minimal, have lots of other tests exercising this code
String changes made by this patch: None
Attachment #609558 - Flags: approval-mozilla-beta?
Attachment #609558 - Flags: approval-mozilla-aurora?
Comment on attachment 609558 [details] [diff] [review]
Patch. v1

[Triage Comment]
Low-risk fix for a FF12 top crasher - approved for Aurora 13 and Beta 12.
Attachment #609558 - Flags: approval-mozilla-beta?
Attachment #609558 - Flags: approval-mozilla-beta+
Attachment #609558 - Flags: approval-mozilla-aurora?
Attachment #609558 - Flags: approval-mozilla-aurora+
[Triage Comment]
Searched crash-stats for ESR and this doesn't show up, marking unaffected, untracking.
Just because no-one is using the ESR release to go to sites which hit this bug, doesn't mean that the ESR release is unaffected. (absence of evidence isn't evidence of absence and all that :) )

Based on conversations with Bent, this bug is still in the ESR code.

It is of course a separate question whether we think it's important enough to fix. Given the low (apparently 0) volume of crashes it seems like it's not a big problem for our users right now.

However given that it was hit a lot at one point, I wouldn't be surprised if ESR users will start triggering this code more eventually, for example as the WebWorkers feature gain more use on the web.
(In reply to Lukas Blakk [:lsblakk] from comment #53)
> [Triage Comment]
> Searched crash-stats for ESR and this doesn't show up
With a 10% throttle on ESR, you have few chance to see it in crash stats.
(In reply to Scoobidiver from comment #55)
> (In reply to Lukas Blakk [:lsblakk] from comment #53)
> > [Triage Comment]
> > Searched crash-stats for ESR and this doesn't show up
> With a 10% throttle on ESR, you have few chance to see it in crash stats.

As pointed out elsewhere, I'm pretty sure we don't throttle ESR reports.
[Triage Comment]
We don't think this is important enough to fix for ESR, so removing tracking.
qa- for verification given that QA was never able to reproduce this crash. If you are able to reproduce it, please verify this is fixed in Firefox 12 and 13.0b3. Thanks.
Whiteboard: [qa-]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: