Closed Bug 1338908 Opened 8 years ago Closed 8 years ago

Crash in mozilla::EventListenerManager::HandleEventSubType

Categories

(Core :: DOM: Events, defect, P1)

51 Branch
All
Windows
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix

People

(Reporter: philipp, Unassigned)

Details

(6 keywords, Whiteboard: [sec-triage-backlog])

Crash Data

This bug was filed from the Socorro interface and is report bp-d7eb9da5-c37a-4449-b427-352602170212. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 xul.dll mozilla::EventListenerManager::HandleEventSubType(mozilla::EventListenerManager::Listener*, nsIDOMEvent*, mozilla::dom::EventTarget*) dom/events/EventListenerManager.cpp:1112 1 xul.dll mozilla::EventListenerManager::HandleEventInternal(nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent**, mozilla::dom::EventTarget*, nsEventStatus*) dom/events/EventListenerManager.cpp:1286 2 xul.dll mozilla::EventTargetChainItem::HandleEventTargetChain(nsTArray<mozilla::EventTargetChainItem>&, mozilla::EventChainPostVisitor&, mozilla::EventDispatchingCallback*, mozilla::ELMCreationDetector&) dom/events/EventDispatcher.cpp:358 3 xul.dll mozilla::EventDispatcher::Dispatch(nsISupports*, nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*) dom/events/EventDispatcher.cpp:711 4 @0x1200187 this crash signature has been around for a while, but it seems to regress in terms of numbers in firefox 51 - it is affecting all versions of windows there & happens mostly in the content process. also russian locale builds seem to be disproportionately high affected. Correlations for Firefox Release (79.14% in signature vs 02.01% overall) address = 0x4 (56.89% in signature vs 11.15% overall) useragent_locale = ru [71.50% vs 14.61% if startup_crash = null] (79.52% in signature vs 35.94% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ (34.89% in signature vs 65.49% overall) Module "iertutil.dll" = true [26.73% vs 75.65% if platform_pretty_version = Windows 10] (34.77% in signature vs 66.55% overall) Module "wininet.dll" = true [28.71% vs 77.05% if platform_pretty_version = Windows 10] (31.35% in signature vs 66.16% overall) Module "dxva2.dll" = true [45.54% vs 79.80% if platform_pretty_version = Windows 10] (31.35% in signature vs 65.99% overall) Module "mf.dll" = true [45.54% vs 79.70% if platform_pretty_version = Windows 10] (31.35% in signature vs 65.96% overall) Module "evr.dll" = true [45.54% vs 79.70% if platform_pretty_version = Windows 10] (33.25% in signature vs 66.74% overall) Module "mfplat.dll" = true [46.53% vs 80.12% if platform_pretty_version = Windows 10] (06.95% in signature vs 39.09% overall) e10s_cohort = disqualified-test [33.74% vs 58.46% if process_type = null] (79.39% in signature vs 31.41% overall) process_type = content [100.0% vs 84.17% if startup_crash = null] (66.37% in signature vs 87.67% overall) Module "psapi.dll" = true [73.27% vs 94.45% if platform_pretty_version = Windows 10]
Component: Untriaged → DOM: Events
Quick thought: possibly related to bug 1304874?
it's generally spiking up in 51 - also when you discount ru users (so that may one factor playing into the issue, but not the only one): http://bit.ly/2kQpdhV
Looks like this is usually a startup crash and there are about 1000 crashes per week on release. Worth investigating for 53/54 if there is anything actionable here.
Stone, please help investigate this.
Flags: needinfo?(sshih)
Priority: -- → P1
Assignee: nobody → sshih
Flags: needinfo?(sshih)
Found several crashes happened when browsing e.mail.ru. Added some assertions to test FF with it but still can't reproduce it.
(In reply to Ming-Chou Shih [:stone] from comment #5) > Found several crashes happened when browsing e.mail.ru. Added some > assertions to test FF with it but still can't reproduce it. Hi Mike, would it be possible to have your team's assistance in investigating the website implementation? Our current investigation so far points that there seems something suspicious while browsing the inbox of e.mail.ru. Hi Florin, would you or your team be able to help us get STRs for reproducing this? Thank you very much!
Flags: needinfo?(miket)
Flags: needinfo?(florin.mezei)
Moving the ni to Andrei to see if we can find some time for investigation. Note that the team is busy with the release for 52 currently, so this will likely not happen very soon.
Flags: needinfo?(florin.mezei) → needinfo?(andrei.vaida)
Is the crash address always 0 + offset? Is aListener null?
Steps to reproduce the crash would be helpful, my Russian isn't great.
Flags: needinfo?(miket)
I can’t reproduce this issue using Firefox 51.0.1(Build Id: 20170125094131,ru), Firefox 52.0 esr (Build Id:20170303022339, ru) and Firefox 53.0b1 (Build Id:20170307064827, ru) on Windows 10 64 bit (GPU:Intel HD Graphics 530), Windows 10 64 bit (GPU: ATI Radeon HD 3000) and Windows 7 x86 (GPU: AMD Radeon HD 6450). I have spent most of the time trying to reproduce this issue using Firefox 51.0.1 (ru), since it is spiking in 51 build and it is affecting mostly the ru users. I have further investigated if this crash is reproducible using the websites where this has occurred frequently (e.mail.ru and deutsche-bank.de) but I didn’t managed to reproduce it. I have also tried to reproduce this issue using Hola VPN proxy but failed to do so.
Flags: needinfo?(andrei.vaida)
sec-high UAF Of the crashes in the last week, there are many with random addresses, and around a dozen with clear UAF addresses, in addition to the more-common 0x4. Crashes in 52bNN, 52esr, and 51esr(??!) in addition to loads in 51. all on the same source line EventListenerHolder listenerHolder(aListener->mListener.Clone()); Last changes by Xidorn in bug 1287706 Another bug possibly involved higher up the stack was bug 1236979 (dholbert)
Group: core-security
FWIW, I was looking at this a bit and couldn't see anything obvious. Crashes seemed to be related to XPCOM event listeners, so perhaps some C++ caller has bad lifetime managing for some listener?
Group: core-security → dom-core-security
(In reply to Olli Pettay [:smaug] from comment #8) > Is the crash address always 0 + offset? > Is aListener null? It doesn't seem so. mListener is the first field in struct Listener, so if aListener is nullptr, the crash should happen at 0x0 rather than 0x4. Also, if aListener is nullptr, this crash should happen earlier lines like [1], unless "Maybe::ptr()" returns something weird. So the EXCEPTION_ACCESS_VIOLATION_READ cases seem more likely to be that the pointer stored in mListener is 0x4, or its 0x0 but the null-check in NS_IF_ADDREF is removed by the compiler. [1] https://hg.mozilla.org/releases/mozilla-release/annotate/327e081221b0/dom/events/EventListenerManager.cpp#l1260
This error happens almost only on x86 Windows, so it could be that something is mis-optimized somehow.
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #13) > It doesn't seem so. mListener is the first field in struct Listener, so if > aListener is nullptr, the crash should happen at 0x0 rather than 0x4. Not so fast. mListener is an EventListenerHolder, which is a CallbackObjectHolder. CallbackObjectHolder derives from CallbackObjectHolderBase. CallbackObjectHolderBase needs to have nonzero size because C++ requires all types to have nonzero size (probably 1 in this case), so CallbackObjectHolder most likely stores (on 32-bit) its actual data at an offset of 0x4 (for alignment).
There is a static_assert in CallbackObjectHolder [1] to ensure that it has the size of a pointer, so it is guaranteed that CallbackObjectHolderBase doesn't contribute to the size of the object. [1] https://dxr.mozilla.org/mozilla-central/rev/528e9dbbb882db0b32792d44b5be9cc539afa1a8/dom/bindings/CallbackObject.h#394-397
All EXCEPTION_ACCESS_VIOLATION_EXEC crashes (which are those related to XPCOMCallback mentioned by smaug in comment 12) come from nsDocument::DispatchContentLoadedEvents() [1], so it might be that there is a broken XPCOM listener setup for DOMContentLoaded event. EXCEPTION_ACCESS_VIOLATION_READ cases don't seem to have pattern on what event would trigger. I guess there are actually two different kinds of crashes happen to share the same signature, and we probably should analyse them separately. [1] https://hg.mozilla.org/releases/mozilla-release/annotate/327e081221b0/dom/base/nsDocument.cpp#l4987
About the signature 'EXCEPTION_ACCESS_VIOLATION_READ to address 0x04', using release build 52 to debug, found something about crash in [1] 1. aListener (EventListenerManager::Listener) is not null 2. mPtrBits of CallbackObjectHolder (mListener) is not null (stores the address of the callback object) 3. The vtbl of the callback object pointer is null 4. When calling .Clone() [2], we are jumping to the address of AddRef (vtbl[1] = 0x04) Looks like the callback object is destroyed. [1] http://searchfox.org/mozilla-central/rev/ef0b6a528e0e7b354ddf8cdce08392be8b8ca679/dom/events/EventListenerManager.cpp#1102 [2] http://searchfox.org/mozilla-central/rev/ef0b6a528e0e7b354ddf8cdce08392be8b8ca679/dom/bindings/CallbackObject.h#449
(In reply to Ming-Chou Shih [:stone] from comment #18) > 2. mPtrBits of CallbackObjectHolder (mListener) is not null (stores the > address of the callback object) 2. mPtrBits of CallbackObjectHolder (mListener) is not 0 > 3. The vtbl of the callback object pointer is null 3. The vtbl of the callback object is null
Although the crash reports hint that it may be caused by a dead listener object, still can't find which object caused it.
Whiteboard: [sec-triage-backlog]
This is an assigned P1 bug without activity in two weeks. If you intend to continue working on this bug for the current release/iteration/sprint, remove the 'stale-bug' keyword. Otherwise we'll reset the priority of the bug back to '--' on Monday, August 28th.
Keywords: stale-bug
Checked the graph and the crashes decreased around Mar19. 66.21% of crashes happened on the build 20170125094131. The second high build is 20170627155318 (3.58%). So far I have no idea about next action. Reset the assignee.
Assignee: sshih → nobody
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.