Closed Bug 980876 Opened 10 years ago Closed 10 years ago

crash in mozilla::dom::GamepadService::NewAxisMoveEvent(unsigned int, unsigned int, double)

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

x86_64
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox28 + disabled
firefox29 + verified
firefox30 + verified
firefox31 --- verified
firefox-esr24 --- unaffected
b2g-v1.2 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: adalucinet, Assigned: ted)

References

()

Details

(4 keywords)

Crash Data

Attachments

(3 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-b51ce988-a231-4d4a-8920-4dee52140307.
=============================================================
More reports: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3AGamepadService%3A%3ANewAxisMoveEvent%28unsigned+int%2C+unsigned+int%2C+double%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-03-07+10%3A00%3A00&range_value=4#reports

STR:
1. Navigate to http://luser.github.io/gamepadtest/
2. Continuously press the Left Stick on the controller and restart Firefox via Developer Toolbar.

Notes:
1. With Nightly from 2013-12-08 (when bug 878828 was pushed), I get this crash: bp-5c4ee358-5275-4a47-8314-925d22140307. I'll investigate further.
2. No crashes on Ubuntu nor Mac OS X.
3. Reproduced with a Xbox 360 controller and Logitech Gamepad F310.
4. Reproducible with latest Aurora and Nightly.
5. It is an intermittent issue, but with the above steps I hit this after 2 or 3 restarts.
This is also reproducible with Firefox 28 beta 9 (Build ID: 20140306171728):
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
It's possible I can wallpaper over this in the Gamepad code, but the fact that we're in a nested event loop from the media shutdown code looks pretty terrible:
Frame	Module	Signature	Source
0	xul.dll	mozilla::dom::GamepadService::NewAxisMoveEvent(unsigned int,unsigned int,double)	dom/gamepad/GamepadService.cpp
1	xul.dll	`anonymous namespace'::GamepadEvent::Run()	hal/windows/WindowsGamepad.cpp
2	xul.dll	nsThread::ProcessNextEvent(bool,bool *)	xpcom/threads/nsThread.cpp
3	xul.dll	mozilla::StateMachineThread::SpinUntilShutdownComplete()	content/media/MediaShutdownManager.cpp
4	xul.dll	mozilla::JoinStateMachineThreads	content/media/MediaShutdownManager.cpp
5	xul.dll	nsTHashtable<nsPtrHashKey<nsFontFaceLoader> >::s_EnumStub(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *)	obj-firefox/dist/include/nsTHashtable.h
6	xul.dll	PL_DHashTableEnumerate	xpcom/glue/pldhash.cpp
7	xul.dll	nsTHashtable<nsRefPtrHashKey<mozilla::StateMachineThread> >::EnumerateEntries(PLDHashOperator (*)(nsRefPtrHashKey<mozilla::StateMachineThread> *,void *),void *)	obj-firefox/dist/include/nsTHashtable.h
8	xul.dll	mozilla::MediaShutdownManager::Shutdown()	content/media/MediaShutdownManager.cpp
9	xul.dll	mozilla::MediaShutdownManager::Observe(nsISupports *,char const *,wchar_t const *)	content/media/MediaShutdownManager.cpp
10	xul.dll	nsObserverService::NotifyObservers(nsISupports *,char const *,wchar_t const *)	xpcom/ds/nsObserverService.cpp
11	xul.dll	mozilla::ShutdownXPCOM(nsIServiceManager *)	xpcom/build/nsXPComInit.cpp
12	xul.dll	XREMain::XRE_main(int,char * * const,nsXREAppData const *)	toolkit/xre/nsAppRunner.cpp 

Presumably the nested event loop (under the xpcom shutdown notification) has screwed up some event ordering that the GamepadService was expecting. I'm not sure whether the gamepad service got deleted here somehow (it looks like the mGamepads access is near NULL):
http://hg.mozilla.org/releases/mozilla-beta/annotate/ade116ed614e/dom/gamepad/GamepadService.cpp#l264

since it's a ClearOnShutdown pointer, which shouldn't have been cleared yet:
http://hg.mozilla.org/releases/mozilla-beta/annotate/ade116ed614e/dom/gamepad/GamepadService.cpp#l424

The stack says this is firing the xpcom-shutdown notification:
http://hg.mozilla.org/releases/mozilla-beta/annotate/ade116ed614e/xpcom/build/nsXPComInit.cpp#l668

The GamepadService listens for xpcom-will-shutdown to set mShuttingDown:
http://hg.mozilla.org/releases/mozilla-beta/annotate/ade116ed614e/dom/gamepad/GamepadService.cpp#l66

So clearly something is going wrong here.
Well, nothing prevents more stuff to be added to mGamepads array while the runnable is in
mainthreads event queue. And adding more stuff to the array may move Gamepad objects.
Group: core-security
We only ever add or remove to the end of the mGamepads array:
http://mxr.mozilla.org/mozilla-central/source/dom/gamepad/GamepadService.cpp#118

I don't remember why that's a reference, maybe it was more useful in a previous incarnation. That wouldn't be hard to fix, it's just using the .id and .numAxes fields anyway. I don't see how that'd cause the problem at hand though. I need to try reproducing this locally.
If you add more stuff to the array, you end up doing memmoves when the buffer nsTArray uses
is increased (effectively replaced with a new larger one)
Okay, but we're only passing indices around. The Gamepad type in WindowsGamepad.cpp is not mozilla::dom::Gamepad, it's a struct defined in that file:
http://hg.mozilla.org/mozilla-central/annotate/8122ffa9e1aa/hal/windows/WindowsGamepad.cpp#l41

The |id| member is the index into GamepadService::mGamepads.
Right, but that's a reference to a struct Gamepad defined in that file, not a mozilla::dom::Gamepad, which is what GamepadService::mGamepads contains.
smaug and I figured out how we were talking past each other on IRC, he's right.

I loaded this dump in a debugger, the top frame is:
0025dacc 0f6cf349 xul!mozilla::dom::GamepadService::NewAxisMoveEvent(unsigned int aIndex = 0x5a5a5a5a, unsigned int aAxis = 2, double aValue = -0.33202105760574341)+0x9 [c:\builds\moz2_slave\m-cen-w32-ntly-000000000000000\build\dom\gamepad\gamepadservice.cpp @ 264]

So it's almost certainly true that while an event is pending, the mGamepads of WindowsGamepadService got changed and the reference went invalid.
Should this be considered a blocker for Firefox 28?
Yes! Either we need to disable gamepad API in FF28 or fix this.
(In reply to Olli Pettay [:smaug] from comment #14)
> Yes! Either we need to disable gamepad API in FF28 or fix this.

We merge Firefox 28 to release today -- needinfo lsbakk
Flags: needinfo?(lsblakk)
I'm close to having a patch, but if today is merge day then it's going to have to be disabled.
Backed out the "enable gamepad by default" patch on beta with a=lsblakk:
https://hg.mozilla.org/releases/mozilla-beta/rev/c6c843644fbf
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0

Gamepad API is properly disabled on Firefox 28 RC (Build ID: 20140310174439): dom.gamepad.enabled is set to false by default.
Does this happen on OSX or Linux?  If so, it might be good to run it on an ASAN.
No, this is Windows-only. (The Mac and Linux gamepad backends are not threaded.) This is a thread race, where the background thread posts an event to the main thread containing a reference to a struct from inside an nsTArray, and then the background thread mutates the nsTArray, invalidating the reference before the main thread gets to run.

The only way to invoke this behavior is to connect or disconnect a gamepad, or to provide gamepad input during shutdown.
Thanks. I'm going to lower the rating a bit, as this sounds like it would be tricky to exploit in practice.
If we want to enable Gamepad API in FF29, need to get a fix for this rather soon.
From comment 16, it sounds like Ted has a patch in progress here, so I'm going to assign this to him.  Any progress on this, Ted?  Should we disable gamepad in 29?
Assignee: nobody → ted
I have a patch in hand, I'm testing locally now. If it seems good I'll push to try and get some QA testing on it. I think it should be safe enough to land on beta.
This patch removes all reference members from the event classes, instead passing the gamepad data by value. It also makes one other important change: when a gamepad is found to no longer be present during a rescan, the background thread notifies the main thread but does not remove it from the array. Once the main thread has processed that removal (by telling the global gamepad service), it marks that gamepad for actual removal and triggers an extra rescan on the background thread, at which point the gamepad is removed from the array.

This patch is a bit hard to read because I reordered the definitions of GamepadEvent/GamepadChangeEvent and WindowsGamepadService so that the former could call the latter. I'll attach a (non-working) version of the patch that contains all the changes without the reordering, it's a lot easier to read. Alternately I suppose I could just declare the Run methods out-of-line, that didn't occur to me until just now.

I can't reproduce any of the crashes with this, I'll push to Try so QA can give it a shot.

Also I hate all of this code. I think in the near future I'm going to rewrite this all using the Raw Input API, which delivers data via Windows messages and won't require threading.
Attachment #8401445 - Flags: review?(bugs)
Attached patch same patch, without reordering (obsolete) — Splinter Review
As promised, same patch without the reordering.
Comment on attachment 8401445 [details] [diff] [review]
be smarter about sending gamepad updates from the background thread

So this id handling is odd. It is hard to follow which id maps to what.
We have mGamepadID, we have mID and mLocalID.

>+// This method is called from the main thread.
>+void
>+WindowsGamepadService::SetGamepadID(int localID, int serviceID) {
>+  mGamepads[localID].id = serviceID;
>+}
>+
>+// This method is called from the main thread.
>+void WindowsGamepadService::RemoveGamepad(int id) {
>+  mGamepads[id].remove = true;
>+  // Signal background thread to remove device.
>+  DevicesChanged(DeviceChangeStable);
>+}
I don't see how these are thread safe. You may change mGamepads arrays in the other
thread while you're about to set .id or .remove, and that leads to set
some random bit somewhere:
- main thread accesses mGamepads[foo]
- the other thread runs and adds something to mGamepads and it contents get memmoved
- main thread sets .id or .remove.
Attachment #8401445 - Flags: review?(bugs) → review-
You're right, I didn't think that entirely through. I worked through the removal steps in my head and was fairly confident about it, although I guess if you had multiple removals at a time even that wouldn't necessarily be safe. I'll put the lock I originally had back in, it doesn't complicate much.
You can't just post the changes from thread to another?
Or is that difficult with the Windows Gamepad thread ?
With appropriate locking.
Attachment #8401816 - Flags: review?(bugs)
Attachment #8401445 - Attachment is obsolete: true
Just for sanity, here's the interdiff from the previous patch.
Attachment #8401446 - Attachment is obsolete: true
Alexandra, can you test this try build and see if it fixes the issue for you?
https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/tmielczarek@mozilla.com-2918273d60db/try-win32/

I suspect this will also fix bug 963053.
Flags: needinfo?(alexandra.lucinet)
Blocks: 963053
Comment on attachment 8401816 [details] [diff] [review]
be smarter about sending gamepad updates from the background thread

You don't protect all the mGamepads accesses with mutex, so when
you do mGamepads[i].remove, some other thread might at the same time
mutate the array.
Attachment #8401816 - Flags: review?(bugs) → review-
I'm pretty well certain the previous patch was sufficient since there are only two threads in play, but here's an updated patch which adds locking around every access of mGamepads. It's a little uglier because I indented large blocks to use MutexAutoLock.
Attachment #8401997 - Flags: review?(bugs)
Attachment #8401816 - Attachment is obsolete: true
Comment on attachment 8401997 [details] [diff] [review]
be smarter about sending gamepad updates from the background thread

(all this code should follow the coding style, but better to not change that here.)

>+  void SetGamepadID(int localID, int serviceID);
>+  void RemoveGamepad(int id);
So what id does RemoveGamepad take?
(localID, but please change the param name)

>+WindowsGamepadService::SetGamepadID(int localID, int serviceID) {
>+  MutexAutoLock lock(mMutex);
>+  mGamepads[localID].id = serviceID;
>+}
>+
>+// This method is called from the main thread.
>+void WindowsGamepadService::RemoveGamepad(int id) {
>+  MutexAutoLock lock(mMutex);
>+  mGamepads[id].remove = true;
>+  // Signal background thread to remove device.
>+  DevicesChanged(DeviceChangeStable);
>+}
Please please use consistent id naming here too
Attachment #8401997 - Flags: review?(bugs) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #34)
> Alexandra, can you test this try build and see if it fixes the issue for you?
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/
> tmielczarek@mozilla.com-2918273d60db/try-win32/
> 
> I suspect this will also fix bug 963053.

With STR from comment 0 I get the following crashes on Windows 7 x64:
bp-57368d44-54cb-4606-8bae-1b64b2140407
bp-772b9c86-08eb-4716-92c8-72b002140407

On the same machine, bug 963053 it's not reproducible with the try build.
Flags: needinfo?(alexandra.lucinet)
So while this patch fixes an issue with the gamepad service, it doesn't seem to fix the issue that was originally reported here.
I changed all the variables named "id" to consistently use "localID" or "globalID", which ought to be better.
Attachment #8401997 - Attachment is obsolete: true
So, upon further review (having loaded all this code context into my head while writing that other patch), I think the original crash that this bug was filed for is just a simple null deref. This patch should fix it.
Attachment #8402602 - Flags: review?(bugs)
Attachment #8402602 - Flags: review?(bugs) → review+
https://hg.mozilla.org/mozilla-central/rev/aff859f63f70
https://hg.mozilla.org/mozilla-central/rev/c65cceb8ce46

So being a sec-high that affects multiple branches, this should have had security approval before landing.
https://wiki.mozilla.org/Security/Bug_Approval_Process

That said, please nominate this for Aurora/Beta approval ASAP. And fill out the sec-approval request ASAP. Also, does this affect any B2G releases or is this disabled there?
Status: NEW → RESOLVED
Closed: 10 years ago
status-b2g-v1.2: --- → ?
status-b2g-v1.3: --- → ?
status-b2g-v1.4: --- → ?
Flags: needinfo?(ted)
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
We haven't shipped this in any release yet, it's preffed off currently. It's enabled on Nightly/Aurora only. I don't know whether that makes the existing releases "unaffected" or not. This does not affect B2G at all, it's not enabled there.

Per comment 20, this is unlikely to be exploitable (although hard to rule out, certainly).
That being said, I do want to land it on Aurora/Beta so we *can* actually ship this API.
Comment on attachment 8402601 [details] [diff] [review]
be smarter about sending gamepad updates from the background thread

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Gamepad API
User impact if declined: Crashes in certain circumstances using Gamepad API during browser shutdown/removing controller.
Testing completed (on m-c, etc.): Basic testing completed on nightly
Risk to taking this patch (and alternatives if risky): Fairly low-risk, Gamepad API has not been shipped yet. Alternative is to disable Gamepad API for another 2 releases :-(
String or IDL/UUID changes made by this patch: None
Attachment #8402601 - Flags: approval-mozilla-beta?
Attachment #8402601 - Flags: approval-mozilla-aurora?
Comment on attachment 8402602 [details] [diff] [review]
null check GamepadService in case of events still in play during shutdown

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Gamepad API
User impact if declined: Possible crashes when using gamepad during browser shutdown/controller removal
Testing completed (on m-c, etc.): Basic QA testing done on try builds
Risk to taking this patch (and alternatives if risky): Very low risk, simply adds some null checks. Alternative is to disable Gamepad API for another 2 releases.
String or IDL/UUID changes made by this patch: None
Attachment #8402602 - Flags: approval-mozilla-beta?
Attachment #8402602 - Flags: approval-mozilla-aurora?
Comment on attachment 8402601 [details] [diff] [review]
be smarter about sending gamepad updates from the background thread

They are going to make it for beta7.
Attachment #8402601 - Flags: approval-mozilla-beta?
Attachment #8402601 - Flags: approval-mozilla-beta+
Attachment #8402601 - Flags: approval-mozilla-aurora?
Attachment #8402601 - Flags: approval-mozilla-aurora+
Attachment #8402602 - Flags: approval-mozilla-beta?
Attachment #8402602 - Flags: approval-mozilla-beta+
Attachment #8402602 - Flags: approval-mozilla-aurora?
Attachment #8402602 - Flags: approval-mozilla-aurora+
I wasn't able to get any crashes on the try build from comment 43, latest Aurora and latest Nightly using two different machines with both Windows 7 32bit and 64bit. 
I will test as well on Firefox 29 beta 7 when it will be released.
Keywords: verifyme
I've tested with Firefox 29 beta 7 on two machines with Windows 7 32bit and 64bit and I haven't been able to crash Firefox.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Group: core-security
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: