Closed Bug 530044 Opened 15 years ago Closed 14 years ago

Stack overflow on corrupted newsgroup Crash [@ arena_malloc_small ] looping through nsMsgQuickSearchDBView::ListIdsInThreadOrder

Categories

(MailNews Core :: Database, defect)

x86_64
All
defect
Not set
critical

Tracking

(thunderbird3.0 .1-fixed)

RESOLVED FIXED
Thunderbird 3.1a1
Tracking Status
thunderbird3.0 --- .1-fixed

People

(Reporter: mcepl, Assigned: Bienvenu)

References

Details

(4 keywords)

Crash Data

Attachments

(7 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.1.5) Gecko/20091105 Fedora/3.5.5-1.fc12 Firefox/3.5.5
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091119 Fedora/3.0-3.11.rc1.fc12 Lightning/1.0pre Thunderbird/3.0

Apparently I've got a corrupted index on a newsgroup gmane.linux.redhat.fedora.devel and whenever I enter the newsgroup TB crashes. Abrt (automatic crash reporting tool in Fedora) falls in pieces however, because it generates 45+MB backtrace (approx. 1M lines of text).

According to Martin Stránský it is a stacker overflow bug. Attached is bzip2ed backtrace.

Reproducible: Always

Steps to Reproduce:
1.see above
2.
3.
Actual Results:  
entering the newsgroup leads to crash

Expected Results:  
it shouldn't ... whatever corruption happened in the NG index (and it shouldn't get corrupted in the first place), it shouldn't put TB to its knees.
Hardware: x86 → x86_64
Attached file backtrace
Matej can you save the .msf for that newsgroup too ?

You got the backtrace with gdb ?
Severity: normal → critical
Component: General → Database
Keywords: crash
Product: Thunderbird → MailNews Core
QA Contact: general → database
Version: unspecified → Trunk
Whilst this may not be the issue, I see that you have enigmail 0.97a installed. Please can you either uninstall that or run in safe mode.

There are known issues with enigmail 0.97a that cause crashes or strange effects. You should definitely update it to a latest nightly build of enigmail.
Here's the stack trace in ultra-condensed form:

nsMsgQuickSearchDBView::ListIdsInThreadOrder
nsMsgQuickSearchDBView::ListIdsInThreadOrder
[ etc. ]
parentKey is 124112, then 124141, then 124112, then repeat again. Yet another coder who assumed that threads could never have cycles! :-)
(In reply to comment #2)
> Matej can you save the .msf for that newsgroup too ?

Yes, I can, but I am afraid it would be useless, because I have reindexed the group already and TB doesn't crash on it anymore.

> You got the backtrace with gdb ?

yes, this was generated by abrt which uses gdb (and Fedora -debuginfo packages).
Attached file gmane.linux.redhat.fedora.devel.msf (obsolete) —
Fortunately Thunderbird got the same folder corrupted again and it crashes every time I got to this group. This is *.msf file.
Attachment #413630 - Attachment is obsolete: true
Is there a public news server I can access this newsgroup from?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: testcase
Attachment #413633 - Attachment description: gmane.linux.redhat.fedora.devel.dat → nntp.gmane.org/gmane.linux.redhat.fedora.devel.dat
Attachment #413632 - Attachment description: gmane.linux.redhat.fedora.devel.msf → nntp.gmane.org/gmane.linux.redhat.fedora.devel.msf
Attachment #413632 - Attachment mime type: application/x-gzip → application/octet-stream
(In reply to comment #9)
> Is there a public news server I can access this newsgroup from?

just plain news.gmane.org
(see http://dir.gmane.org/gmane.linux.redhat.fedora.devel and http://gmane.org/about.php)

and this is overview of all files attached as seen from /home/matej/.thunderbird/vq2fybjd.default/News:

bradford:News$ ls -l */*fedora.devel*
-rw-------. 1 matej matej   182620 20. lis 12.15 news.gmane.org/gmane.linux.redhat.fedora.devel
-rw-r--r--. 1 matej matej       25 13. říj 14.21 news.gmane.org/gmane.linux.redhat.fedora.devel.dat
-rw-rw-r--. 1 matej matej 51295625 20. lis 18.32 news.gmane.org/gmane.linux.redhat.fedora.devel.msf
-rw-r--r--. 1 matej matej       25 19. zář 23.23 nntp.gmane.org/gmane.linux.redhat.fedora.devel.dat
-rw-r--r--. 1 matej matej    11230 19. zář 23.23 nntp.gmane.org/gmane.linux.redhat.fedora.devel.msf
bradford:News$
nominating for blocking, though ride-along is much more likely, if I can find a simple fix.
Assignee: nobody → bienvenu
Status: NEW → ASSIGNED
Flags: blocking-thunderbird3?
Keywords: testcase
Drivers don't think this is significant enough to block on, but we'd take a ride-along patch later or possibly something for a dot release.
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Whiteboard: [tb3ride-along]
The attached .msf file doesn't crash for me with a 3.0 build - since you were crasomg in quick search code, you must have had a view selected, or done a quick search. Do you know what that might have been?

Is Fedora using the about to ship 3.0 TB code?
(In reply to comment #15)
> The attached .msf file doesn't crash for me with a 3.0 build - since you were
> crasomg in quick search code, you must have had a view selected, or done a
> quick search. Do you know what that might have been?

Threaded view with unread messages

> Is Fedora using the about to ship 3.0 TB code?

Sorry, don't understand. Yes, this is Fedora build of TB (available on http://koji.fedoraproject.org/koji/buildinfo?buildID=141955). And now I don't think this is final TB 3.0 package for Fedora. Of course we will have at least one more build if/when you release it. I guess, we may do even one more build for the real RC1.
(In reply to comment #16)

> > Is Fedora using the about to ship 3.0 TB code?

I just meant how closely are you tracking the comm central 1.9.1 branch. If you're within a day or two, then you have all the fixes. 

Oh, darn, I think I needed your .newsrc file (or at least the line for this newsgroup) in order to see what you're seeing with view | unread messages.
(In reply to comment #17)
> I just meant how closely are you tracking the comm central 1.9.1 branch. If
> you're within a day or two, then you have all the fixes. 

Adding actual maintainer of Thunderbird in Red Hat to the CC list of this bug.

> Oh, darn, I think I needed your .newsrc file (or at least the line for this
> newsgroup) in order to see what you're seeing with view | unread messages.

Unfortunately, I am not at my computer ATM, so will attach when I am back at home.
Attached file newsrc files
bradford:~$ locate News/newsrc-
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.cs.felk.cvut.cz
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.eclipse.org
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.felk.cvut.cz
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.gmane.org
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.grc-1.com
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.grc-2.com
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.grc.com
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.mozilla-1.org
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.mozilla.org
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-nntp.gmane.org
/home/matej/.thunderbird/vq2fybjd.default/News/newsrc-post-office.corp.redhat.com
bradford:~$ zip -9rT newsrc.zip $(locate News/newsrc-)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.cs.felk.cvut.cz (deflated 9%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.eclipse.org (deflated 15%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.felk.cvut.cz (deflated 9%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.gmane.org (deflated 59%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.grc-1.com (stored 0%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.grc-2.com (deflated 9%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.grc.com (deflated 35%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.mozilla-1.org (deflated 43%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-news.mozilla.org (deflated 35%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-nntp.gmane.org (deflated 67%)
  adding: home/matej/.thunderbird/vq2fybjd.default/News/newsrc-post-office.corp.redhat.com (deflated 35%)
test of newsrc.zip OK
bradford:~$
(In reply to comment #3)
> Whilst this may not be the issue, I see that you have enigmail 0.97a installed.
> Please can you either uninstall that or run in safe mode.
> 
> There are known issues with enigmail 0.97a that cause crashes or strange
> effects. You should definitely update it to a latest nightly build of enigmail.

I have uninstalled enigmail and I still get crashes like https://bugzilla.redhat.com/attachment.cgi?id=373277 (from the closed bug https://bugzilla.redhat.com/show_bug.cgi?id=540694).
Summary: Stack overflow on corrupted newsgroup → Stack overflow on corrupted newsgroup Crash @[arena_malloc_small ]
Summary: Stack overflow on corrupted newsgroup Crash @[arena_malloc_small ] → Stack overflow on corrupted newsgroup Crash [@ arena_malloc_small ]
currently #10 crasher for 3.0, 1.2% of crashes

to see if all arena_malloc_small crashes were this bug, I checked in 2 week period for 3.0b4.  All contain nsMsgQuickSearchDBView::ListIdsInThreadOrder. I spot checked some other signatures containing arena_malloc_small, and none contain nsMsgQuickSearchDBView::ListIdsInThreadOrder. (nsMsgQuickSearchDBView::ListIdsInThreadOrder isn't in top 10 frames of any crashes of 3.0b4 and 3.0 in the last two months. frame 12 and higher)

example stack bp-f606d26c-a838-40d9-baf0-caf2b2091118
0	mozcrt19.dll	arena_malloc_small	 objdir-tb/mozilla/memory/jemalloc/src/jemalloc.c:4055
1	mozcrt19.dll	malloc	objdir-tb/mozilla/memory/jemalloc/src/jemalloc.c:6177
2	mozcrt19.dll	operator new	objdir-tb/mozilla/memory/jemalloc/src/new.cpp:54
3	thunderbird.exe	orkinHeap::Alloc	db/mork/src/orkinHeap.cpp:90
4	thunderbird.exe	morkNext::MakeNewNext	db/mork/src/morkNode.cpp:182
5	thunderbird.exe	morkTable::NewTableRowCursor	db/mork/src/morkTable.cpp:1540
6	thunderbird.exe	morkTable::GetTableRowCursor	db/mork/src/morkTable.cpp:458
7	thunderbird.exe	nsMsgThread::GetChildHdrAt	mailnews/db/msgdb/src/nsMsgThread.cpp:533
8	thunderbird.exe	nsMsgThread::GetChildHdrForKey	mailnews/db/msgdb/src/nsMsgThread.cpp:1069
9	thunderbird.exe	nsMsgThread::GetRootHdr	mailnews/db/msgdb/src/nsMsgThread.cpp:956
10	thunderbird.exe	nsMsgThreadEnumerator::nsMsgThreadEnumerator	mailnews/db/msgdb/src/nsMsgThread.cpp:699
11	thunderbird.exe	nsMsgThread::EnumerateMessages	mailnews/db/msgdb/src/nsMsgThread.cpp:905
12	thunderbird.exe	nsMsgQuickSearchDBView::ListIdsInThreadOrder	mailnews/base/src/nsMsgQuickSearchDBView.cpp:641 
(repeats)
16176	thunderbird.exe	nsMsgQuickSearchDBView::ListIdsInThreadOrder	 mailnews/base/src/nsMsgQuickSearchDBView.cpp:667
16177	thunderbird.exe	nsMsgQuickSearchDBView::ListIdsInThreadOrder	mailnews/base/src/nsMsgQuickSearchDBView.cpp:667
16178	thunderbird.exe	nsMsgQuickSearchDBView::ListIdsInThreadOrder	mailnews/base/src/nsMsgQuickSearchDBView.cpp:680
16179	thunderbird.exe	nsMsgQuickSearchDBView::SortThreads	mailnews/base/src/nsMsgQuickSearchDBView.cpp:531
16180	thunderbird.exe	nsMsgThreadedDBView::Sort	mailnews/base/src/nsMsgThreadedDBView.cpp:361
16181	thunderbird.exe	nsMsgQuickSearchDBView::OnSearchDone	mailnews/base/src/nsMsgQuickSearchDBView.cpp:335
16182	thunderbird.exe	nsMsgSearchSession::NotifyListenersDone	mailnews/base/search/src/nsMsgSearchSession.cpp:598
blocking-thunderbird3.0: --- → ?
Keywords: testcase, topcrash
OS: Linux → All
Summary: Stack overflow on corrupted newsgroup Crash [@ arena_malloc_small ] → Stack overflow on corrupted newsgroup Crash [@ arena_malloc_small ] looping through nsMsgQuickSearchDBView::ListIdsInThreadOrder
currently #33 crash for 3.0 and dropping.  rare in nightlies (like only a couple a month)

bp-085fc527-d2c3-4462-9dca-c03fb2090924 mentions ... rapidly clicking between newsgroups/messages on a secure server

bp-ea7198a5-ebd7-4c63-8438-6e4f12091107 changed the threading in the moz seamonkey NG
Attached patch proposed fix (obsolete) — Splinter Review
this is analogous to what we do in normal threaded views, and should fix the stack overflow. I don't have a reproducible case, however.
Attachment #416825 - Flags: superreview?(neil)
Attachment #416825 - Flags: review?(neil)
(In reply to comment #25)
> Created an attachment (id=416825)
> 
> this is analogous to what we do in normal threaded views
Normal threaded views compare *pNumListed to numChildren, is there a good reason not to do that (i.e. copy lines 5155-5166 of nsMsgDBView.cpp) here?
This is more like the code in nsMsgDBView.cpp, except that I don't want to blow away the db for this situation. I'd like to try to repair the corruption, but I don't have a test-case to reproduce the bug...
Attachment #416825 - Attachment is obsolete: true
Attachment #417705 - Flags: superreview?(neil)
Attachment #417705 - Flags: review?(neil)
Attachment #416825 - Flags: superreview?(neil)
Attachment #416825 - Flags: review?(neil)
Whiteboard: [tb3ride-along] → [tb3ride-along][has patch for review]
It happeans to me today.

Crash Id 8eab8e40-0728-4a34-a7e5-3086b2091216

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091215 Lightning/1.1a1pre Shredder/3.1a1pre ID:20091215062043
Whiteboard: [tb3ride-along][has patch for review] → [has patch][needs review neil]
Comment on attachment 417705 [details] [diff] [review]
slightly different check

>+  // If we discover depths of more than numChildren,
Nit: comment out of date ;-)

>+      // Technically, this is an error, but forcing a database rebuild
>+      // is too destructive so we just return.
>+      if (*pNumListed > numChildren)
In fact, it takes two more children than we were expecting to trigger this. Fortunately this is still using InsertMsgHdrAt rather than SetMsgHdrAt so it doesn't matter yet.
Attachment #417705 - Flags: superreview?(neil)
Attachment #417705 - Flags: superreview+
Attachment #417705 - Flags: review?(neil)
Attachment #417705 - Flags: review+
fixed on trunk:
changeset:   4543:27b6c6e10fd2
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.1a1
Attachment #417705 - Flags: approval-thunderbird3.0.1?
Whiteboard: [has patch][needs review neil]
Not blocking on this as it has gone down in the rankings, we'll probably take the patch anyway.
blocking-thunderbird3.0: ? → ---
Comment on attachment 417705 [details] [diff] [review]
slightly different check

a=Standard8
Attachment #417705 - Flags: approval-thunderbird3.0.1? → approval-thunderbird3.0.1+
fixed for 3.01
(In reply to comment #34)
> fixed for 3.01

Shouldn't this have landed on 'default' hg branch (too)?
Yes, I see what you're trying to say - you mean the 1.9.1 branch, not the trunk...thx for catching this.
(in the process of verifying 3.0.1 fixes)
I suspect this is not gone so being conservative and reopening.

4 crashes in 3.0.1pre in the past 5 days, and crash rate is too low to say the problem diminished at all after checkin. 
bp-49a254dc-220b-4abe-9455-7071a2100103
bp-4ea0bd79-d909-4f07-bded-e51e72100103

Aureliano, was this (comment 28) crash reproducible for you?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #37)
> (in the process of verifying 3.0.1 fixes)
> I suspect this is not gone so being conservative and reopening.

Wayne, as we have tried to fix something that's included in a stable release, can we file a new bug rather than reopening? That way we get to keep track of what's been included in which release. (Ludovic agreed this was the best to do, even if we end up with 10 reports/fixes for one stack).
Status: REOPENED → RESOLVED
Closed: 15 years ago14 years ago
Resolution: --- → FIXED
(In reply to comment #38)
> (In reply to comment #37)
> > (in the process of verifying 3.0.1 fixes)
> > I suspect this is not gone so being conservative and reopening.
> 
> Wayne, as we have tried to fix something that's included in a stable release,
> can we file a new bug rather than reopening? That way we get to keep track of
> what's been included in which release. (Ludovic agreed this was the best to do,
> even if we end up with 10 reports/fixes for one stack).

I forgot there was a testcase here. Yes, makes sense to keep this FIXED.

pinged reporters of Bug 532093 and bug 536070 in their respective bugs, as unfortunately they were never asked to test 3.0.1pre (somehow we missed that) and they never commented here. If both report their problem is gone then we'll open a new bug.

Matej, is there a bug#/link that can be added to the "See also" link?  Also, perhaps your reporter of the linux testcase can verify for us that this patch works.
This bug is not fixed since I am seeing frequent crashes even on newly subscribed news groups.
(In reply to comment #42)
> This bug is not fixed since I am seeing frequent crashes even on newly
> subscribed news groups.

What version are you using ?
I am using 3.1b2Pre and have automatic updates enabled so I am always running the latest. My test environment has ~100 NG configured across ~15 NNTP servers.  The crash seems to happen mainly when showing Unread items in a threaded NG view. Once this starts I can usually get it the NG back in working order by selecting it for offline use, download all message and rebuilding the index.
Still seeing crashes that are reportedly duplicates of 530044 which is marked as fixed. This crash happens with great frequency while browsing or search NG posts.

http://crash-stats.mozilla.com/report/index/bp-05575b27-999f-40d0-ae88-598602100312
David, if you have a reproducible crash on a particular newsgroup, if you send me the .msf file for that newsgroup, along with the newsrc file for the server (assuming you're viewing unread only as your quick search), I can try it out.
I will see what I can do about getting you a file for testing. Unfortunately, the news groups where I have encountered this most frequently are private and contain content under NDA. I have also seen this happen on public news groups and will be sure to send you a repeatable example as soon as possible.

One thing I have noticed is that sometimes duplicate posts show in the affected new groups. By duplicate I mean the exact date/time/subject etc. Rebuilding the index often corrects the duplicate entries which suggests there is a corruption problem. Perhaps the root cause of the crash is due to an infinite loop caused by this corruption. The crashes also only seem to happen when the NG is display is threaded.

Thanks,
David
Suggest we wait on bienvenu's analysis before we reopen (any bugs) vs creating a new bug. 

But there are crash reports with email addresses, so perhaps we'd be better served treating arena_malloc_small and malloc | operator new(unsigned int) | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**) separately?

Notes:

* bug 536070 malloc | operator new(unsigned int) | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**)
** reporter is MIA but benb seems to think that case was fixed the patch in this bug.
** two crashes with email addresses, taylor's and someone at ibm bp-381b1977-3a45-4504-9efb-2aaee2100219

* bug 549105 arena_malloc_small reporter peter says he was fixed in 3.0.3 - quite unclear why it went away between v3.0.1 and 3.0.3

* arena_malloc_small i.e. bug 531029 and this bug 
** 5 email address in crash reports
** perhaps a candidate for reopening 
** my sense is v3.0.3 crash rate is same as v3.0.1 and v3.0 but I can't say for sure without spending lots of time on this (longer search period than 4 weeks sure would be nice to have, but I wouldn't give any appendages for it)
*** http://crash-stats.mozilla.com/query/query?product=Thunderbird&version=ALL%3AALL&date=&range_value=4&range_unit=weeks&query_search=signature&query_type=exact&query=malloc+|+operator+new%28unsigned+int%29+|+orkinHeap%3A%3AAlloc%28nsIMdbEnv*%2C+unsigned+int%2C+void**%29&build_id=&process_type=all&do_query=1  
*** https://crash-stats.mozilla.com/report/list?product=Thunderbird&build_id=&query_search=signature&query_type=exact&query=arena_malloc_small&date=2%2F15%2F2010&range_value=4&range_unit=weeks&process_type=all&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&signature=arena_malloc_small&missing_sig=&page=1
This bug is still an issue. Crash-stats shows 214 occurrences of that bug during the last 4 weeks (96 of those from TB 3.1.2).
(In reply to comment #49)
> This bug is still an issue. Crash-stats shows 214 occurrences of that bug
> during the last 4 weeks (96 of those from TB 3.1.2).

Given how long this has been fixed for and it was fixed on a branch, please file a new bug as it is easier to track for getting on branches etc.
Blocks: 593007
(In reply to comment #49)
> This bug is still an issue. Crash-stats shows 214 occurrences of that bug
> during the last 4 weeks (96 of those from TB 3.1.2).

Aqualon, do you have a new bug for this?
Crash Signature: [@ arena_malloc_small ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: