Startup crash in [@ InvalidArrayIndex_CRASH | mozilla::a11y::Accessible::InsertChildAt]
Categories
(Thunderbird :: General, defect)
Tracking
(thunderbird67+ affected, thunderbird68+ fixed)
People
(Reporter: wsmwk, Assigned: mkmelin)
References
Details
(Keywords: crash, reproducible, topcrash, Whiteboard: [startup crash][Thunderbird-testfailure: Z all][Thunderbird-disabled-test][XBL:3])
Crash Data
Attachments
(2 files)
2.28 KB,
patch
|
Details | Diff | Splinter Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
#2 crash for nightlies, 20190222 and 20190223 builds. Looks like 2-3 users.
This bug is for crash report bp-aae15f22-c8f5-4eff-94ad-1d4660190222.
Top 10 frames of crashing thread:
0 mozglue.dll static void MOZ_Crash mfbt/Assertions.h:314
1 mozglue.dll MOZ_CrashPrintf mfbt/Assertions.cpp:55
2 xul.dll InvalidArrayIndex_CRASH xpcom/ds/nsTArray.cpp:27
3 xul.dll mozilla::a11y::Accessible::InsertChildAt accessible/generic/Accessible.cpp:2093
4 xul.dll mozilla::a11y::DocAccessible::MoveChild accessible/generic/DocAccessible.cpp:2125
5 xul.dll mozilla::a11y::DocAccessible::CacheChildrenInSubtree accessible/generic/DocAccessible.cpp:2153
6 xul.dll mozilla::a11y::DocAccessible::CreateSubtree accessible/generic/DocAccessible-inl.h:131
7 xul.dll mozilla::a11y::DocAccessible::ProcessContentInserted accessible/generic/DocAccessible.cpp:1734
8 xul.dll mozilla::a11y::NotificationController::WillRefresh accessible/base/NotificationController.cpp:766
9 xul.dll nsRefreshDriver::Tick layout/base/nsRefreshDriver.cpp:1850
Comment 1•6 years ago
|
||
Sure, that's the crash we're seeing on the tree and which I reported in bug 1529872 comment #6. Ugly.
Should be a dupe of that bug eventually. Sadly FF doesn't crash.
Comment 2•6 years ago
|
||
We saw this crash in MozMill testing in message-header/test-header-toolbar.js. That test passes now without crash after bug 1531282 was fixed. So I declare this ...
Fixed by bug 1531282.
Comment 3•6 years ago
|
||
I was wrong, the crash is still there in testing:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=05bbb10f557f67065a00841335e0801c336c1a5c&selectedJob=231331893
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=05bbb10f557f67065a00841335e0801c336c1a5c&selectedJob=231332272
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=05bbb10f557f67065a00841335e0801c336c1a5c&selectedJob=231330908
It's related to re-enabling the three tests in mail/test/mozmill/message-header/test-header-toolbar.js:
https://hg.mozilla.org/comm-central/rev/05bbb10f557f67065a00841335e0801c336c1a5c#l1.12
I'll disable them again :-(
Magnus, it's time for you to have a look. The test doesn't fail on its own, but it crashes the entire directory when those three tests are run.
Comment 4•6 years ago
|
||
Victor, any idea what's going on here? This was first reported in bug 1529872 comment #6, which is the bug that captured the bustage after the tree conversion from XBL to CE.
Comment 5•6 years ago
|
||
I'll have a look. Keeping ni?.
Assignee | ||
Comment 6•6 years ago
|
||
Good news and bad news: bug 1535265 will actually rework the tested parts, and those disabled tests are going to be removed in the process. So no crashing tests anymore. Bad news is of course that somewhere in the code there would still be code potentially triggering a crash (but hard to say if it would ever appear in the wild.)
Comment 7•6 years ago
|
||
Alexander, can you make any sense of this a11y crash? Only crashing on Thunderbird.
Reporter | ||
Comment 8•6 years ago
|
||
bp-d1032989-a0a9-4526-be72-01be10190319 is only crash with a comment "closing settings tab".
I'm equally concerned about Bug 1529792 - Crash in [@ nsContentSecurityManager::doContentSecurityCheck] viewing successive nntp articles.
That said, the crash rates are quite low. So if we must wait for fixes then I think we'd benefit from shipping sooner, without fixes, so we start getting beta user feedback on 67 - we can set a low update rate and have time to assess the feedback.
Comment 9•6 years ago
|
||
(Thanks for the reminder, I asked over in the other bug.)
Comment 10•6 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #7)
Alexander, can you make any sense of this a11y crash? Only crashing on Thunderbird.
MoveChild crashes [1] are complicated and hard to understand without having a way to debug it. Is there any steps to reproduce the crash locally?
Comment 11•6 years ago
|
||
str |
Yes, easily reproducible if you build TB. From the object directory you run:
mozmake mozmill-one SOLO_TEST=message-header
It crashes in test-header-toolbar.js, but only if you run it as part of the entire directory. We've currently got three subtests disabled, so in that test change function disabled_test
back to function test
:
https://searchfox.org/comm-central/search?q=function+disabled_test&case=false®exp=false&path=test-header-toolbar.js
Thanks in advance.
Comment 12•6 years ago
|
||
Sorry, on Linux or Mac it's just make
. And it could be that it doesn't work on Mac at all, in which case you need to apply the patch from bug 1533922.
I've just tried it re-enabling the three tests and I still get:
Hit MOZ_CRASH(ElementAt(aIndex = 2, aLength = 0)) at c:/mozilla-source/comm-central/xpcom/ds/nsTArray.cpp:29
Comment 13•6 years ago
|
||
For your convenience to enable the tests.
Reporter | ||
Comment 14•6 years ago
|
||
Alexandar, please see comment 11 - 13
This is now a topcrash for beta and nightly
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 15•6 years ago
|
||
turned off beta updates until we have a patch to uplift.
Comment 16•6 years ago
|
||
Just of the record: I warned on TB drivers and I wanted this bug to be looked at before shipping TB 67 beta. But others knew better :-(
Updated•6 years ago
|
Reporter | ||
Comment 17•6 years ago
|
||
Just of the record: I warned on TB drivers and I wanted this bug to be looked at before shipping TB 67 beta. But others knew better :-(
Actually, I am totally comfortable and satisfied with that decision, and the results of that decision are excellent. The crashing could have been worse, or could have been better. Fact is we (including you) couldn't possibly know until we shipped.
As hoped it's not a crash rate which left users immobilized. Quite the opposite we are at ~50% updated and that crash rate is only double the previous #1 rated crash - that's pretty OK. I only turned off updates so as to not inconvenience those who haven't yet updated and because we now have the benefit of a) knowing there are not other new crashes associated with 67, b) knowing that the release is reasonably usable and c) having enough users on it that we get feedback that wasn't possible from nightlies. Again, that was the desired result.
Reporter | ||
Comment 18•6 years ago
|
||
~75% of crashes are startup, which we failed to note in the analysis.
But ~25% of crashes are NOT startup, which makes this Kinda interesting. Examples below. Do we have two different issues?
bp-f5f7f454-9126-409a-a400-99a920190401 2 minutes uptime
bp-d1032989-a0a9-4526-be72-01be10190319 6 minutes uptime
bp-a2fbc823-7f4d-45d0-bd9f-b394a0190322 16 minutes uptime
And extremely rare, is a couple linux crashes
bp-1ccc5e3e-c447-48dc-b753-9eb060190404 3 hours uptime
bp-6f6f0a7b-1679-44c2-b8ef-e7d5f0190405 17 hours uptime
Assignee | ||
Comment 19•6 years ago
|
||
I looked around, and while I assume there's some higher level bug, this looks very suspicious:
https://searchfox.org/comm-central/rev/29215b2012a35d80ea879bb5403290caa3f41242/mozilla/accessible/generic/DocAccessible.cpp#2129-2149
There we first check if the index is ok, then go on to remove one child, and go on, but then the first check regarding aIdxInParent would not necessarily be true anymore.
Comment 20•6 years ago
|
||
That link doesn't work. You can't use comm-central ... /mozilla links. This is what you want:
https://searchfox.org/mozilla-central/source/accessible/generic/DocAccessible.cpp#2129-2149
Comment 21•6 years ago
|
||
I'm not knowledgeable enough to know what's going on here and it looks like asurkov is looking at this already, removing ni.
Assignee | ||
Comment 22•6 years ago
|
||
At a second look, that's changing different nodes so is not the issue.
But, looks like the below should be && instead of ||. Else -1 (which is passed in to the unsigned index later) is always true.
https://searchfox.org/mozilla-central/source/accessible/generic/DocAccessible.cpp#2135
bool hasInsertionPoint =
(aIdxInParent != -1) ||
(aIdxInParent <= static_cast<int32_t>(aNewParent->ChildCount()));
For -1 that would be false || true, which is still true.
I'll make a patch for that.
Assignee | ||
Updated•6 years ago
|
Comment 23•6 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #22)
At a second look, that's changing different nodes so is not the issue.
But, looks like the below should be && instead of ||. Else -1 (which is passed in to the unsigned index later) is always true.https://searchfox.org/mozilla-central/source/accessible/generic/DocAccessible.cpp#2135
bool hasInsertionPoint =
(aIdxInParent != -1) ||
(aIdxInParent <= static_cast<int32_t>(aNewParent->ChildCount()));For -1 that would be false || true, which is still true.
I'll make a patch for that.
Good catch! Alternatively (aIndexInParent >= 0 && aIndexInParent <= aNewParent->ChildCount()). Might be a bit more readable and error-prone.
Comment 24•6 years ago
|
||
I changed the code in question to
- bool hasInsertionPoint =
- (aIdxInParent != -1) ||
- (aIdxInParent <= static_cast<int32_t>(aNewParent->ChildCount()));
+ bool hasInsertionPoint = (0 <= aIdxInParent &&
+ aIdxInParent <= static_cast<int32_t>(aNewParent->ChildCount()));
and ran our test suite with the crash-producing tests enabled:
mozmake mozmill-one SOLO_TEST=message-header
It crashes just the same: :-(
Hit MOZ_CRASH(ElementAt(aIndex = 2, aLength = 0)) at c:/mozilla-source/comm-central/xpcom/ds/nsTArray.cpp:29
#01: mozilla::a11y::DocAccessible::MoveChild (c:\mozilla-source\comm-central\accessible\generic\DocAccessible.cpp:2149)
#02: mozilla::a11y::DocAccessible::CacheChildrenInSubtree (c:\mozilla-source\comm-central\accessible\generic\DocAccessible.cpp:2176)
Assignee | ||
Comment 25•6 years ago
|
||
Dug further and found the problem, but I'm not sure what the correct fix is.
The above mentioned fix is correct (but irrelevant). What happens is that the ChildCount()s will check check the wrong thing for trees: https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/accessible/xul/XULTreeAccessible.cpp#354
With the below change, things will work correctly for our case (but I assume not in others, didn't check yet):
- if (!true) return childCount;
- if (!mTreeView) return childCount;
Since the childCount will count the rows for trees, and Accessible::InsertChildAt will look only at mChildren.Length() there will be a mismatch leading to a crash.
Assignee | ||
Comment 26•6 years ago
|
||
- if (true) return childCount;
- if (!mTreeView) return childCount;
Assignee | ||
Comment 27•6 years ago
|
||
if (true) was the change that made it work
Comment 28•6 years ago
|
||
Let's count the three disabled tests in test-header-toolbar.js here. Note that they will be removed in bug 1535265, but it would be good to fix the crash first before removing the reproducible evidence.
Comment 29•6 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #25)
Dug further and found the problem, but I'm not sure what the correct fix is.
The above mentioned fix is correct (but irrelevant). What happens is that the ChildCount()s will check check the wrong thing for trees: https://searchfox.org/mozilla-central/rev/6dab6dad9cc852011a14275a8b2c2c03ed7600a7/accessible/xul/XULTreeAccessible.cpp#354
With the below change, things will work correctly for our case (but I assume not in others, didn't check yet):
- if (!true) return childCount;
- if (!mTreeView) return childCount;
Since the childCount will count the rows for trees, and Accessible::InsertChildAt will look only at mChildren.Length() there will be a mismatch leading to a crash.
Accessible::InsertChildAt operates on mChildren, so this part looks correct. It should assert on a wrong index though. I think DocAccessible::CacheChildrenInSubtree could be changed to use mChildren.Length() instead ChildCount(), what should fix the issue, but not sure wheather MoveChild() make sense on XUL trees at all. Do you know what caused MoveChild()? Is it because of aria-owns usage or it comes somehow natively?
Assignee | ||
Comment 30•6 years ago
|
||
For XULTreeAccessible, the ChildCount() is not only the mChildren, so check mChildren directly to make sure we stay within bounds
Assignee | ||
Comment 31•6 years ago
|
||
I made a try run earlier today which looks pretty good (and the THunderbird tests pass)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f885de35e89fd78b19106d682b2cd6a593e38424
Assignee | ||
Comment 32•6 years ago
|
||
Updated DocAccessible::CacheChildrenInSubtree to check mCHildren directly too. Unfortunately I don't know what's causing MoveChild, or the details of why it happens. For the crashing Thunderbird test case it is only triggered when running over all the tests in the directory (and not run one by one), so it's really hard to pinpoint what goes wrong. It does have something to do with customizing toolbar elements.
Reporter | ||
Comment 33•6 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #32)
It does have something to do with customizing toolbar elements.
How might that relate to the crash for users being on startup?
Assignee | ||
Comment 34•6 years ago
|
||
Can't say. Running the mozmill tests is the only way I know that reproduce this. There might be more than one thing triggering the crash of course.
Comment 35•6 years ago
|
||
Updated•6 years ago
|
Comment 36•6 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/13e57aaba5a0
re-enable three tests in test-header-toolbar.js. r=me
Comment 37•6 years ago
|
||
bugherder |
Reporter | ||
Comment 38•6 years ago
|
||
Thanks for the patch.
Good news! No trunk crashes after build 20190418003825.
Can we uplift to beta to kill the #2 beta crasher?
Comment 39•6 years ago
|
||
Well, this is really an M-C bug, so I'd have to do it on a branch for the TB 67 beta 2.
Reporter | ||
Comment 40•6 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #39)
Well, this is really an M-C bug, so I'd have to do it on a branch for the TB 67 beta 2.
I was just throwing the question to the wind - whatever you guys arrange, landing on M-beta or a branch, is fine with me so we can turn beta updates back on. Fortunately any branch will be short lived.
Comment 41•6 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/7243480d39dc3a57e6f81d3b6bf8cc4f6cf062d5 on THUNDERBIRD670b2_2019042501_RELBRANCH
Comment 42•6 years ago
|
||
TB 67 beta 2:
https://hg.mozilla.org/releases/comm-beta/rev/c6411719c75a4c91d0454aeb529d293b3f3d9ae3
Re-enabled tests to make sure there's no crash ;-)
Description
•