Closed Bug 1368270 Opened 3 years ago Closed 3 months ago
Crash in mozilla::a11y::IDSet::Get
ID (MSAA id exhaustion)
This bug was filed from the Socorro interface and is report bp-61312828-4aef-4b83-8a9f-5a4fa0170526. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 xul.dll mozilla::a11y::IDSet::GetID() obj-firefox/dist/include/mozilla/a11y/IDSet.h:76 1 xul.dll mozilla::a11y::MsaaIdGenerator::GetID() accessible/windows/msaa/MsaaIdGenerator.cpp:93 2 xul.dll mozilla::a11y::AccessibleWrap::GetChildIDFor(mozilla::a11y::Accessible*) accessible/windows/msaa/AccessibleWrap.cpp:1331 3 xul.dll mozilla::a11y::AccessibleWrap::FireWinEvent(mozilla::a11y::Accessible*, unsigned int) accessible/windows/msaa/AccessibleWrap.cpp:1230 4 xul.dll mozilla::a11y::AccessibleWrap::HandleAccEvent(mozilla::a11y::AccEvent*) accessible/windows/msaa/AccessibleWrap.cpp:1280 5 xul.dll nsEventShell::FireEvent(mozilla::a11y::AccEvent*) accessible/base/nsEventShell.cpp:45 6 xul.dll mozilla::a11y::NotificationController::ProcessMutationEvents() accessible/base/NotificationController.cpp:552 7 xul.dll mozilla::a11y::NotificationController::WillRefresh(mozilla::TimeStamp) accessible/base/NotificationController.cpp:813 8 xul.dll nsRefreshDriver::Tick(__int64, mozilla::TimeStamp) layout/base/nsRefreshDriver.cpp:1798 9 xul.dll mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver*, __int64, mozilla::TimeStamp) layout/base/nsRefreshDriver.cpp:326 10 xul.dll mozilla::RefreshDriverTimer::TickRefreshDrivers(__int64, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) layout/base/nsRefreshDriver.cpp:295 11 xul.dll mozilla::RefreshDriverTimer::Tick(__int64, mozilla::TimeStamp) layout/base/nsRefreshDriver.cpp:316 12 xul.dll mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::TimeStamp) layout/base/nsRefreshDriver.cpp:663 13 xul.dll mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp) layout/base/nsRefreshDriver.cpp:583 14 xul.dll mozilla::detail::RunnableMethodImpl<void ( mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp), 1, 0, mozilla::TimeStamp>::Run() obj-firefox/dist/include/nsThreadUtils.h:810 15 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1216 16 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:361 17 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:124 18 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225 19 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205 20 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 21 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262 22 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp:283 23 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:4488 24 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp:4621 25 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4712 26 firefox.exe do_main browser/app/nsBrowserApp.cpp:282 27 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 28 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 29 kernel32.dll BaseThreadInitThunk 30 ntdll.dll __RtlUserThreadStart 31 ntdll.dll _RtlUserThreadStart this crash signature on windows is showing up since firefox 52 and in subsequent versions. nearly all the reports were crashing with "MOZ_CRASH(used up all the available ids)" that got added in bug 606080.
Wow, that's ... remarkable... Sounds like a lot of tabs. Unfortunately there doesn't seem to be any of the known screen readers involved. At least not in the crash from comment #0. Aaron, ever seen this?
Wow... now that was something I was not expecting to see. There is a different assertion that I was expecting to see but that one only shows up when dom.ipc.processCount >= 128. This one I was definitely not expecting. It's implying one of two things. Either: 1) There are so many accessibles that they have exhausted all 2^24 unique ids; or 2) Accessibles are not always releasing their ids when they are destroyed.
Given that I've seen this crash linked to MemShrink bugs, I am convinced that this is just a symptom of a bigger problem: there is a leak elsewhere such that nodes (and their associated accessibles) are not being cleaned up and we're exhausting our unique ID space. I don't think there is much we can do here other than fix the bug(s) that are causing the node leakage.
(See https://bugzilla.mozilla.org/show_bug.cgi?id=1372092#c6 for the associated MemShrink bug)
(In reply to Aaron Klotz [:aklotz] (a11y work receiving priority right now, please send interceptor reviews to dmajor or handyman) from comment #3) > Given that I've seen this crash linked to MemShrink bugs, I am convinced > that this is just a symptom of a bigger problem: there is a leak elsewhere > such that nodes (and their associated accessibles) are not being cleaned up > and we're exhausting our unique ID space. > > I don't think there is much we can do here other than fix the bug(s) that > are causing the node leakage. I bet if we had more data, then we might had more ideas. For example, do we know whether this happens both in e10s and non e10s builds? Whether there are correlations for user's surfing habbits, e.g. number of open tabs, life time of the tabs, whether the problem is visible on certain websites only. Also it might be not Gecko's problem if I understand it right. If AT fails to release an accessible object, then we should face this issue sooner or later. If the latter issue is valid, then what steps can taken on this way? Can we force the IDs pool clearance at some point? So, how can we approach to this?
Setting it to P1. Aaron, do you have actionable ideas on this bug?
Priority: -- → P1
Not at the moment. I will mark it for triage and we'll discuss it tomorrow.
Summary: Crash in mozilla::a11y::IDSet::GetID → Crash in mozilla::a11y::IDSet::GetID (MSAA id exhaustion)
This is a P1 bug without an assignee. P1 are bugs which are being worked on for the current release cycle/iteration/sprint. If the bug is not assigned by Monday, 28 August, the bug's priority will be reset to '--'.
This crashed a tab for one of my web app's testers. My web app is a continuous stream of live data that is presented by adding server formatted html to the page using element.innerHTML = new server data. The server data has around 20 html elements, and it is updated 5-10 times a second. The user had only a couple of other non-demanding tabs open. He wasn't using FF to surf other than for testing the web app. He had FF open for awhile, maybe a few days. I will try to make a test case that reliably triggers this crash. https://crash-stats.mozilla.com/report/index/f0cbea81-9665-4757-82da-144271180615#tab-details
(In reply to justinpulliam from comment #9) > This crashed a tab for one of my web app's testers. My web app is a > continuous stream of live data that is presented by adding server formatted > html to the page using element.innerHTML = new server data. The server data > has around 20 html elements, and it is updated 5-10 times a second. Thanks for the details. Just to double check, do you mean the data is completely *replaced* every update with 20 elements or that 20 elements are *added* every update (so 20, then 40, then 60, etc. total)? I'm guessing the former, but just wanted to clarify given your use of the term "stream of data". After bug 1434822, we now force disconnect remote accessibility clients when a content accessible is shut down. That should cause the id to be released. So, if the data is being replaced (not added to), we must be leaking an accessible somehow and not shutting it down.
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
Closing because no crashes reported for 12 weeks.
There are still some crashes so reopen it.
There are 1000+ crashes for this signature in the past week. Someone should investigate why that bot couldn't find them... How many other bugs like this were erroneously closed? bp-dde060dd-cc7b-4cc8-8fa2-5d8780190106 MOZ_CRASH Reason: MOZ_CRASH(used up all the available ids)
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Flags: needinfo?(hsivonen) → needinfo?(justinpulliam)
Whiteboard: a11y:crash-win → a11y:crash-win [MemShrink]
Whiteboard: a11y:crash-win [MemShrink] → a11y:crash-win
Flags: needinfo?(jteh) → needinfo?(jonahyong)
Assignee: nobody → jteh
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/38eb7e998859 When shutting down Windows AccessibleWraps, don't clear the id. r=MarcoZ
Attachment #9165607 - Flags: approval-mozilla-esr78?
You need to log in before you can comment on or make changes to this bug.