Thunderbird 50 is crashing if selecting the first inbox [@ InvalidArrayIndex_CRASH | nsMsgKeyArray::GetArray ]

VERIFIED FIXED in Thunderbird 52.0

Status

defect
--
critical
VERIFIED FIXED
3 years ago
2 years ago

People

(Reporter: Nomis101, Assigned: squib)

Tracking

(4 keywords)

Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird49 unaffected, thunderbird50+ fixed, thunderbird51 fixed, thunderbird52 fixed)

Details

(crash signature)

Attachments

(2 attachments)

Reporter

Description

3 years ago
I have four mail accounts in Thunderbird. It is no problem to work with account 2 or 3 or 4. But if I try to select the first inbox (most upper one), than Thunderbird is crashing instantly. This is with Thunderbird 50.0a2. I'm on macOS 10.12 (16A320). I will try to add a lldb debugging output of this crash.

I can not test this with Thunderbird 51, because this version is crashing always immediately at startup.
Reporter

Comment 1

3 years ago
Posted file Apple Crash Report
I could not figure out yet how to debug this with lldb, so for now I'm attaching the Apple crash report. If I've figured out how this lldb is working, I will attach this output as well.

Comment 2

3 years ago
Can you please post the crash report ID from the Troubleshooting Information.
Reporter

Comment 3

3 years ago
I couldn't find the one from 09-13, so I've reproduced the crash. This one is
Crash ID: bp-9b943ebb-e5c3-47ec-bd67-63c202160915
MozCrashReason: ElementAt(aIndex = 0, aLength = 0)
CrashTime: 1473962458

If somebody can guid me how to debug this crash with lldb, I could do this. Seems a lot different like it was with gdb.

Comment 4

3 years ago
Maybe Kent can take a look.
Flags: needinfo?(rkent)
The stack has no hints of mailnews code except for the final event, which is a crash when nsMsgKeyArray::GetArray has an invalid index.

I can't help you with OSX. If this was within Windows and reproducible, here is what I would try to do.

With a debugger on, I would try to see the Js Stack when the crash occurs. Get the debugger to gain control on the crash. Within Visual Studio I would then do Quick Watch with Dump JSStack() which will show the JS stack at the point of interruption in the DOS console. That might give some hints.

I would simplify. Can I move the message to a folder so that only has this message and reproduce the crash?

If you can get it simple enough, then you might me able to audit creations of the nsMsgKey object and try to figure out who is calling it that might affect the length. Maybe add printouts to anything that could affect the length, and see how it is changing.

None of this is easy, which is why we get paid the Big Bucks (or not).
Flags: needinfo?(rkent)
The crash first appears with 50.0a2 20160901004002. Unless nightly builds were broken in the preceeding week, judging by the crash rate and the build date being a Thursday I'd say the regression likely occurred one or two days prior.
https://crash-stats.mozilla.com/signature/?product=Thunderbird&date=%3E2016-06-01&signature=InvalidArrayIndex_CRASH%20%7C%20nsMsgKeyArray%3A%3AGetArray&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1

at least two other users are reporting this
bp-6000ce64-ad73-47fa-ae12-03f7d2160918 t7 using win7
bp-c6da5158-559b-441d-b932-332852160919 bp-05068310-85b1-4c07-9d64-8f8122160919 mxn on Mac

All that said, all the crashes I looked at have mailsummaries addon installed. But it has no recent updates - https://addons.mozilla.org/en-US/thunderbird/addon/mail-summaries/versions/
Severity: major → critical
Crash Signature: [@ InvalidArrayIndex_CRASH | nsMsgKeyArray::GetArray ]
Summary: Thunderbird 50 is crashing if selecting the first inbox → Thunderbird 50 is crashing if selecting the first inbox [@ InvalidArrayIndex_CRASH | nsMsgKeyArray::GetArray ]
Reporter

Comment 7

3 years ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #6)
> All that said, all the crashes I looked at have mailsummaries addon
> installed. But it has no recent updates -
> https://addons.mozilla.org/en-US/thunderbird/addon/mail-summaries/versions/

Good point. Uninstalling mailsummaries fixed this crash for me and the crash on startup I had with TB 51 (but now opening an email takes for ages).
Assignee

Comment 8

3 years ago
Yeah, I haven't touched Mail Summaries in a while, so this is probably a regression in a TB codepath that only Mail Summaries exercises.
Assignee

Comment 9

3 years ago
For what it's worth, here's where it's probably crashing in Mail Summaries: https://bitbucket.org/squib/mail-summaries/src/9426db33ac6152df85987190a029da129efdb849/chrome/content/folderSummary.js?at=default&fileviewer=file-view-default#folderSummary.js-156

At least that way, you can see some context.
Turns out this signature arises from https://groups.google.com/forum/#!topic/mozilla.dev.platform/bqKWX4gYN14

Minh Nguyễn gets both this crash and Bug 1192778, so perhaps they are related
See Also: → 1192778
Chris is also seeing this in 50 beta.
 InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsMsgKeyArray::GetArray 
bp-6a0e18ce-8902-4021-a660-d2bff2161008

It's a somewhat popular addon. We definitely need to get a grip on this
Crash Signature: [@ InvalidArrayIndex_CRASH | nsMsgKeyArray::GetArray ] → [@ InvalidArrayIndex_CRASH | nsMsgKeyArray::GetArray ] [@ InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsMsgKeyArray::GetArray ]

Comment 12

3 years ago
fyi, from the MailSummaries dev:

"In any case, if you're using a beta of Thunderbird, you'll probably need to use the latest version of the code on Bitbucket, not the released version here. In general, I try to time releases of Mail Summaries to major releases of Thunderbird, since I don't have much time to devote to this project."
Assignee

Comment 13

3 years ago
That's me; I'm already following this bug.

We could really use a regression window on this.
(In reply to Jim Porter (:squib) from comment #13)
> That's me; I'm already following this bug.
> 
> We could really use a regression window on this.

https://mozilla.github.io/mozregression/ is the tool to make this easy.

Can one of you get the regression range?   It would be a great help
Flags: needinfo?(pgnet.dev+mozbugs)
Flags: needinfo?(mxn)
Flags: needinfo?(Nomis101)
Assignee

Comment 15

3 years ago
Well, I've found one bug already, but I don't think it accounts for all the problems I've seen with Mail Summaries.
Assignee

Comment 16

3 years ago
Posted patch Fix itSplinter Review
This fixes it. The issue is that `&array[0]` will trigger an assertion for an empty nsTArray. Instead, we should do the sane thing and just get the begin iterator (which, like std::vector, is a pointer to the beginning of the array).
Assignee: nobody → squibblyflabbetydoo
Status: NEW → ASSIGNED
Attachment #8802697 - Flags: review?(rkent)
Comment on attachment 8802697 [details] [diff] [review]
Fix it

Review of attachment 8802697 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not convinced this will work, but I am not sure of the right thing to do either.

The clone() return for zero length array, as I understand it, is not well defined (derived from malloc which is not well defined). It may or may not be null. If it does return null, then the method will return an error, but I think what you want it to do is return an empty array.

I think what I would prefer is defined behavior. That is, if length is zero, set *aLength to zero, *aKeys to null, and return NS_OK. It would be good to write a very simple unit test (or tack this onto another one) to show that this method, going through XPCOM, will work OK for zero length array.

If you can convince me that the return will be well defined and expected, I could accept your change.
Attachment #8802697 - Flags: review?(rkent) → review-
Assignee

Comment 18

3 years ago
That doesn't appear to be the case, since NS_Alloc (which is what nsMemory::Clone uses under the hood) *used* to assert if you gave it a size of 0, but that was fixed a long time ago: see bug 322908.
Assignee

Comment 19

3 years ago
In any case, I don't believe Gecko uses the C standard library's malloc at all; as far as I'm aware, we always use jemalloc, which returns a non-NULL pointer from malloc(0).

Updated

3 years ago
Blocks: 1296680
Comment on attachment 8802697 [details] [diff] [review]
Fix it

Review of attachment 8802697 [details] [diff] [review]:
-----------------------------------------------------------------

OK thanks, let's do this then.
Attachment #8802697 - Flags: review- → review+
Reporter

Comment 21

3 years ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #14)
> (In reply to Jim Porter (:squib) from comment #13)
> > That's me; I'm already following this bug.
> > 
> > We could really use a regression window on this.
> 
> https://mozilla.github.io/mozregression/ is the tool to make this easy.
> 
> Can one of you get the regression range?   It would be a great help

I could try that on the weekend. But if you now have a fix, is this still needed?
Flags: needinfo?(Nomis101)
(In reply to Nomis101 from comment #21)
> (In reply to Wayne Mery (:wsmwk, NI for questions) from comment #14)
> > (In reply to Jim Porter (:squib) from comment #13)
> > > That's me; I'm already following this bug.
> > > 
> > > We could really use a regression window on this.
> > 
> > https://mozilla.github.io/mozregression/ is the tool to make this easy.
> > 
> > Can one of you get the regression range?   It would be a great help
> 
> I could try that on the weekend. But if you now have a fix, is this still
> needed?

Not required. But generally helpful, because it helps us understand *why* this happened so we can potentially avoid it in the future, and also identify other potential fallout caused by the regressing bug.

That said, are we hitting hitting https://groups.google.com/forum/#!topic/mozilla.dev.platform/bqKWX4gYN14 via bug 1159244 ?
Blocks: 1312136
When you land the patch on c-c, please also land it in aurora
Flags: needinfo?(squibblyflabbetydoo)
Assignee

Comment 24

3 years ago
Marking checkin-needed since my HG privileges have lapsed and I don't want to delay things any more by waiting to get them back.
Flags: needinfo?(squibblyflabbetydoo)
Keywords: checkin-needed

Comment 25

3 years ago
https://hg.mozilla.org/comm-central/rev/8c98ed909c9fb53a736f7cfbf83ce3f41c8aff39
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(pgnet.dev+mozbugs)
Flags: needinfo?(mxn)
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 52.0

Comment 26

3 years ago
Comment on attachment 8802697 [details] [diff] [review]
Fix it

Most likely we don't do another TB 50 beta, so no approval for now.
Attachment #8802697 - Flags: approval-comm-beta?
Attachment #8802697 - Flags: approval-comm-aurora+
> Most likely we don't do another TB 50 beta, so no approval for now.

Actually, I'm thinking we should do another beta (I know that isn't part of the original plan) in part because we potentially will need to iterate many times over the next few months to deal with the fallout of bug 1159244. I'd be surprised if we didn't end up with at least half dozen patches, most of which will need uplift to beta to judge effectiveness, unless the patch has a reproducible testcase.  n.b. kent's bug 1312136 comment 4 about bug 1159244 which landed in fx50

Comment 28

3 years ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #27)
> > Most likely we don't do another TB 50 beta, so no approval for now.
> Actually, I'm thinking we should do another beta ...
Let me know and I'll do the approvals and uplifts.
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #27)
> > Most likely we don't do another TB 50 beta, so no approval for now.
> 
> Actually, I'm thinking we should do another beta (I know that isn't part of
> the original plan)

Just in case there is any doubt, though I may argue against a particular release, I assume that Wayne serves as the overall release planner, and can make decisions about what releases we do or do not do.
Jorg, please do uplift this to beta. And we'll assess on Thursdday when to build our third beta. (Note, Firefox just posted they extending their 50 beta by a week. So we will have a longer third beta for Thunderbird as well.)

Rationale - the patch seems to be effective for InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsMsgKeyArray::GetArray [1] which was the #2 crash for nightly (no crashes reported for 10-25 and 10-26 builds). But the crash rate for InvalidArrayIndex_CRASH | nsMsgKeyArray::GetArray and other "array" crashes (7 signatures in the top 100 for nightly) is to low to determine the impact of the patch [3]. So we need the patch on beta to better determine it's impact where there are 13 "array" signatures in the top 200[4].  Plus, we benifit by killing the #1 crash on beta.

[1] https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=52.0a1&signature=InvalidArrayIndex_CRASH%20%7C%20nsTArray_Impl%3CT%3E%3A%3AElementAt%20%7C%20nsMsgKeyArray%3A%3AGetArray&date=%3E%3D2016-10-20T00%3A00%3A00.000Z&date=%3C2016-10-27T00%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

[2] https://crash-stats.mozilla.com/signature/?product=Thunderbird&version=52.0a1&signature=InvalidArrayIndex_CRASH%20%7C%20nsTArray_Impl%3CT%3E%3A%3AElementAt%20%7C%20nsMsgKeyArray%3A%3AGetArray&date=%3E%3D2016-10-20T00%3A00%3A00.000Z&date=%3C2016-10-27T00%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

[3] https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&_tcbs_mode=byday&_facets_size=100&_range_type=build&days=14&version=52.0a1
[4] https://crash-stats.mozilla.com/topcrashers/?product=Thunderbird&version=50.0b0&_facets_size=200&days=14
Flags: needinfo?(jorgk)

Updated

3 years ago
Keywords: checkin-needed
PGN, nomis11, is your crash gone when using 50.0b3?
Flags: needinfo?(pgnet.dev+mozbugs)
Flags: needinfo?(Nomis101)

Comment 34

3 years ago
> PGN, nomis11, is your crash gone when using 50.0b3?

Yep, upgrade to TB 50.0b3 and re-enable of MailSummaries and ...

... so far, so good.  No more crash while selecting 1st tab, or any tab manipulation.
Flags: needinfo?(pgnet.dev+mozbugs)
I had another user, Giacomo, also report this crash sig is gone when using 50.0b3.  However, he is now crashing with nsTreeBodyFrame::PrefillPropertyArray https://bugzilla.mozilla.org/show_bug.cgi?id=615958 which he was experiencing prior to getting this array index crash.

Filed a couple followups for
#2 50.0b2 crash - bug 1314348
#4 50.0b2 crash - bug 1314347
It remains to be seen to want rank these signatures will rise for 50.0b3
Reporter

Comment 36

3 years ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #33)
> PGN, nomis11, is your crash gone when using 50.0b3?

Yes, the crash described here seems fixed with this patch in TB 50.
Flags: needinfo?(Nomis101)

Updated

3 years ago
No longer blocks: 1296680
crash-stats confirms no crashes starting 50.0b3.
Thanks!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.