226730 - Slow threading when all messages have the same subject, with mail.strict_threading

Reporter

Description

•

21 years ago

This is a follow-up to bug 172786. Steps to reproduce: 1) Fill a folder with numerous messages with the same subject 2) Turn on threading for this folder Mozilla starts to consume all available CPU. Tracing its process on Linux shows this: kill(5529, SIGRTMIN) = 0 gettimeofday({1069773704, 813034}, NULL) = 0 gettimeofday({1069773704, 814891}, NULL) = 0 gettimeofday({1069773704, 817263}, NULL) = 0 gettimeofday({1069773704, 819174}, NULL) = 0 gettimeofday({1069773704, 820994}, NULL) = 0 gettimeofday({1069773704, 822855}, NULL) = 0 gettimeofday({1069773704, 824665}, NULL) = 0 gettimeofday({1069773704, 826472}, NULL) = 0 gettimeofday({1069773704, 828276}, NULL) = 0 gettimeofday({1069773704, 830056}, NULL) = 0 gettimeofday({1069773704, 831863}, NULL) = 0 gettimeofday({1069773704, 833675}, NULL) = 0 gettimeofday({1069773704, 835475}, NULL) = 0 gettimeofday({1069773704, 837280}, NULL) = 0 gettimeofday({1069773704, 839084}, NULL) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 write(5, "\372", 1) = 1 ioctl(3, FIONREAD, [32]) = 0 read(3, "~\2\t\21ZXA\1\1\20\0\0\20\0\0\0\0\0\0\20\20\20\20\20\0"..., 32) = 32 poll([{fd=3, events=POLLIN}], 1, 0) = 0 write(3, "\230\4\2\0\0\1\0\1", 8) = 8 read(3, "\1\1\n\21\0\0\0\0\20\0\0\20\0\0\0\0\0\0\20\0\0\0\0\0\0"..., 32) = 32 ioctl(3, FIONREAD, [0]) = 0 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN, revents=POLLIN}], 2, 0) = 1 gettimeofday({1069773710, 224690}, NULL) = 0 gettimeofday({1069773710, 224729}, NULL) = 0 gettimeofday({1069773710, 224861}, NULL) = 0 kill(5529, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0 kill(5530, SIGRTMIN) = 0

Aleksander Adamowski

Reporter

Comment 1

•

21 years ago

This problem occurs on a folder where I have a large number of bounce messages generated by a mail server that I admin. Currently the size of that folder is 162 MB, but the problem was significant even when the folder had ~300 messages (that would amount to a couple of MBs only) on a relatively fast hardware (Pentium III, 512MB RAM, UDMA 66 hard drive). See my comments in bug 172786 for details.

Aleksander Adamowski

Reporter

Comment 2

•

21 years ago

I don't know whether this will be useful (the binaries are stripped), but here's the result of next-stepping and back-tracing the running mozilla-bin process (the one consuming CPU). I attached to the process by "gdb - PID": #0 0x4285f900 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so #1 0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) next Single stepping until exit from function NSGetModule, which has no line number information. 0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) bt #0 0x40a8bf96 in PL_DHashTableRawRemove () from /usr/lib/mozilla-1.4/libxpcom.so #1 0x09cebd28 in ?? () (gdb) next Single stepping until exit from function PL_DHashTableRawRemove, which has no line number information. 0x40a8bf0a in PL_DHashTableOperate () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) bt #0 0x40a8bf0a in PL_DHashTableOperate () from /usr/lib/mozilla-1.4/libxpcom.so #1 0x09cebd28 in ?? () (gdb) next Single stepping until exit from function PL_DHashTableOperate, which has no line number information. 0x4285faf5 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so (gdb) bt #0 0x4285faf5 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so #1 0x09cebd28 in ?? () (gdb) next Single stepping until exit from function NSGetModule, which has no line number information. 0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) bt #0 0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so #1 0x09cbb498 in ?? () (gdb) next Single stepping until exit from function PL_DHashTableEnumerate, which has no line number information. 0x4285f550 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so (gdb) bt #0 0x4285f550 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so #1 0x40a8c063 in PL_DHashTableEnumerate () from /usr/lib/mozilla-1.4/libxpcom.so (gdb) next Single stepping until exit from function NSGetModule, which has no line number information. 0x424d9de0 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmork.so (gdb) bt #0 0x424d9de0 in NSGetModule () from /usr/lib/mozilla-1.4/components/libmork.so #1 0x4286e64a in NSGetModule () from /usr/lib/mozilla-1.4/components/libmsgdb.so

David :Bienvenu

Comment 3

•

21 years ago

thanks, I know exactly what's going on, but it's hard to fix - turning off threading by subject will speed it up for you. user_pref("mail.thread_without_re", false);

Assignee: sspitzer → bienvenu

Component: Mail Window Front End → Mail Database

Aleksander Adamowski

Reporter

Comment 4

•

21 years ago

I've set "mail.thread_without_re" to "false" in about:config, but it didn't help much... It's Mozilla Mozilla 1.4 More specifically, Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030630. The change of pref was effective: user_pref("mail.thread_without_re", false); But still entering the 162 MB folder caused Mozilla to use all CPU for a few hours (Athlon 1800 MHz) and I had to leave it obernight to finish that stuff. Then today in the morning I've selected about 4/5 of messages and pressed DEL. Mozilla started consuming all CPU again. After about 1 hour I've made a strace, then closed Mozilla safely. Strace looked like this: gettimeofday({1069929043, 591423}, NULL) = 0 gettimeofday({1069929043, 593281}, NULL) = 0 lseek(93, 0, SEEK_CUR) = 87017914 lseek(93, 0, SEEK_CUR) = 87017914 lseek(93, 0, SEEK_END) = 87017914 lseek(93, 87017914, SEEK_SET) = 87017914 lseek(93, 87017914, SEEK_SET) = 87017914 gettimeofday({1069929043, 604744}, NULL) = 0 lseek(93, 0, SEEK_CUR) = 87017914 lseek(93, 0, SEEK_CUR) = 87017914 lseek(93, 0, SEEK_END) = 87017914 lseek(93, 87017914, SEEK_SET) = 87017914 lseek(93, 87017914, SEEK_SET) = 87017914 write(93, "From - Mon Jul 14 10:11:13 2003\n"..., 2641) = 2641 lseek(93, 0, SEEK_CUR) = 87020555 lseek(93, 0, SEEK_CUR) = 87020555 lseek(93, 0, SEEK_END) = 87020555 lseek(93, 87020555, SEEK_SET) = 87020555 lseek(93, 87020555, SEEK_SET) = 87020555 lseek(93, 0, SEEK_CUR) = 87020555 kill(16905, SIGRTMIN) = 0 write(10, "\372", 1) = -1 EAGAIN (Resource temporarily unavailable) ioctl(3, FIONREAD, [32]) = 0 read(3, "\34\366\256\353\20\0\300\1\23\1\0\0\227\210\247\5\0\0\0"..., 32) = 32 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 0) = 1 ioctl(3, FIONREAD, [32]) = 0 read(3, "\34\0\256\353\20\0\300\1\23\1\0\0\311\210\247\5\1\0\0\0"..., 32) = 32 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 0) = 1 ioctl(3, FIONREAD, [32]) = 0 read(3, "\34\366\256\353\20\0\300\1\23\1\0\0\312\210\247\5\0\0\0"..., 32) = 32 poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 0) = 1 ioctl(3, FIONREAD, [32]) = 0 read(3, "\34\0\256\353\20\0\300\1\23\1\0\0\373\210\247\5\1\0\0\0"..., 32) = 32 poll([{fd=3, events=POLLIN}], 1, 0) = 0 ioctl(3, FIONREAD, [0]) = 0 poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN}], 2, 0) = 1 gettimeofday({1069929043, 931590}, NULL) = 0 gettimeofday({1069929043, 933517}, NULL) = 0

Aleksander Adamowski

Reporter

Comment 5

•

21 years ago

Sorry for previous comment, I tested with an olde release... Fix to bug 172786 changed the behaviour significantly. Here's what it looks like with the latest nightly build from Nov 25: Browsing the folder runs acceptably (I do have threading without "Re" set to false), and when I delete half of messages, Mozilla initially behaves well: It doesn't stop responding; It shows "Moving messages to Trash" message in the status bar; But when the move operation gets finished, it enters an unresponsive state again. Consumes all CPU, and strace with "-r" (relative timestamps) shows this: 0.001829 write(42, "X-Mozilla-Status: 0008", 22) = 22 0.001876 lseek(42, 0, SEEK_CUR) = 1583644 0.001815 lseek(42, 0, SEEK_CUR) = 1583644 0.001816 lseek(42, 0, SEEK_END) = 169382599 0.001806 lseek(42, 1583644, SEEK_SET) = 1583644 0.001813 lseek(42, 1583623, SEEK_SET) = 1583623 0.001825 lseek(42, 0, SEEK_CUR) = 1583623 0.001777 read(42, "-Mozilla-Status: 0008\nX-Mozilla-"..., 49) = 49 0.001945 lseek(42, 0, SEEK_CUR) = 1583672 0.001818 lseek(42, 0, SEEK_CUR) = 1583672 0.001793 lseek(42, 0, SEEK_END) = 169382599 0.001808 lseek(42, 1583672, SEEK_SET) = 1583672 0.001812 lseek(42, 1583645, SEEK_SET) = 1583645 1.228290 lseek(42, 0, SEEK_CUR) = 1583645 0.002306 lseek(42, 0, SEEK_CUR) = 1583645 0.001817 lseek(42, 0, SEEK_END) = 169382599 0.001817 lseek(42, 1583645, SEEK_SET) = 1583645 0.001814 lseek(42, 1585981, SEEK_SET) = 1585981 0.001796 lseek(42, 0, SEEK_CUR) = 1585981 0.001809 read(42, "X-Mozilla-Status: 0000\nX-Mozilla"..., 49) = 49 0.001945 lseek(42, 0, SEEK_CUR) = 1586030 0.002010 lseek(42, 0, SEEK_CUR) = 1586030 0.001801 lseek(42, 0, SEEK_END) = 169382599 0.001809 lseek(42, 1586030, SEEK_SET) = 1586030 0.001818 lseek(42, 1586004, SEEK_SET) = 1586004 0.001815 lseek(42, 0, SEEK_CUR) = 1586004 0.002219 lseek(42, 0, SEEK_CUR) = 1586004 0.002014 lseek(42, 0, SEEK_END) = 169382599 0.001812 lseek(42, 1586004, SEEK_SET) = 1586004 0.001815 lseek(42, 1585981, SEEK_SET) = 1585981 0.001841 lseek(42, 0, SEEK_CUR) = 1585981 0.002065 write(42, "X-Mozilla-Status: 0008", 22) = 22 0.002042 lseek(42, 0, SEEK_CUR) = 1586003 0.001825 lseek(42, 0, SEEK_CUR) = 1586003 As visible, occasional lseek()'s take over 1 second to complete.

David :Bienvenu

Comment 6

•

21 years ago

when you delete messages from a local folder, we have to go mark the messages deleted in the local mail folder, adjusting the x-mozilla-status line of each message to indicate that it has been deleted. This involves seeking to each message and rewriting the x-mozilla-status line, as you can tell from your log. It's not really mozilla's fault if an individual seek take a second, but I'm not sure that it's really the seek that's taking a second. I suspect what's happening is that for every 300 messages deleted, we're committing the .msf file and that's taking close to a second. I can't tell how many messages you're deleting, but if it's 1000, for example, you'll have 3 db commits. We were trying to keep the db in sync, but in this particular example, it's the wrong thing to do, and I'll try to attach a patch to fix it.

Status: NEW → ASSIGNED

Aleksander Adamowski

Reporter

Comment 7

•

21 years ago

The long seeks occured more frequently. Something around every 5-th - 10th write, a 1 second pause. But this might really be the fault of operating system, since when the folder size dropped after successful delete (I've let it finish its work...) now there are no pauses. It has 120 MB and about 70000 messages and strace doesn't show delays when I further delete large amount of messages. Consumes all CPU, but finishes in a justifiable amount of time. Maybe I should mark this bug as depending on bug 58508...

Aleksander Adamowski

Reporter

Comment 8

•

21 years ago

On the other hand, this bug is really about slow threading by same subject, not deletion. So ignore the stuff about slow deletion.

Aleksander Adamowski

Reporter

Comment 9

•

21 years ago

I've just discovered one more performance problem with the big folder, this time related to marking messages as read. Filed a new bug 227011.

Aleksander Adamowski

Reporter

Comment 10

•

21 years ago

May be important: I still have this pref set to false: user_pref("mail.thread_without_re", false); And all messages in the folder have tha same subject, but it doesn't start with "Re:". So Mozilla doesn't thread them (or at least isn't supposed to).

Aleksander Adamowski

Reporter

Comment 11

•

21 years ago

The fix for bug 227011 improved responsiveness an performance of operations on large quantities of messages considerably. However, this issue (threading large quantities of messages) is still present. Testing with build 2003120209, I've discovered that when threading without "Re" is off, building summary for a folder works fine, progressing at a constant rate. When threading is on, strace shows that a certain system call gets longer and longer to execute, its time of execution grows together with the number of messages for which summary waas already built. The syscall looks like this: write(5, "\372", 1) = 1 I've checked file descriptor number 5 of that mozilla process with "lsof -d 5 | grep moz" - it is some sort of a FIFO pipe. When building of summary file is at the beginning, this syscall takes about 0.1 second to complete. When summary for 2000 messages is completed, this syscall takes 0.7 second to complete. When summary for 4000 messages is completed, this syscall takes 1.2 second to complete. When summary for 6000 messages is completed, this syscall takes 1.7 second to complete. When summary for 8000 messages is completed, this syscall takes 2.1 second to complete. When summary for 10000 messages is completed, this syscall takes 2.7 second to complete. This is exactly linear growth of execution time of that syscall proportional to number of messages summarized so far.

Aleksander Adamowski

Reporter

Comment 12

•

21 years ago

Attached file Strace of Mozilla process — Details

This is a strace of Mozilla process (latest build 2003120209 with fix for bug 227011) doing the rebuild of summary file for big folder. Strace was started before entering that folder and interrupted when rebuilding started to progress very slowly. It's compressed with bzip2.

Aleksander Adamowski

Reporter

Comment 13

•

21 years ago

Comment on attachment 136719 [details] Strace of Mozilla process Threading was _on_ during this strace.

Matthias Bockelkamp

Updated

•

21 years ago

Blocks: 236849

Andreas Kunz

Comment 14

•

21 years ago

Bug 159660 and bug 90452 might be related.

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Magnus Melin [:mkmelin]

Comment 15

•

19 years ago

*** Bug 262843 has been marked as a duplicate of this bug. ***

Wayne Mery (:wsmwk)

Comment 16

•

17 years ago

> Do you still see this problem given that bug 90452 is fixed ? Aleksander on 11/7/2006: ... threading a folder containing 4300 messages with the same subject took about 30 seconds on a 2.80 GHz P4, so it's still quite slow.

Phil Ringnalda (:philor)

Updated

•

17 years ago

QA Contact: esther → database

Wayne Mery (:wsmwk)

Comment 17

•

17 years ago

Reporter is gone. Perhaps fixed by bug 90452 2005-07-27 => incomplete

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Resolution: --- → INCOMPLETE

Aleksander Adamowski

Reporter

Comment 18

•

17 years ago

I'm not gone. The problem is still present on Thunderbird 2.0.0.14 (20080505). On a folder with 5941 messages, 5829 of which have the same subject ("NOTICE: mail delivery status."), when I click the thread button, Thunderbird starts consuming all available CPU for about 2 minutes. The CPU is a 1.73GHz Intel Pentium M. There are 2GB of RAM available. There's no excessive RAM, disk or network usage - only CPU. During that time, Thunderbird's window doesn't repaint, and the application appears frozen - a typical user would likely kill it before it finishes its work. When it finishes the folder view looks normal, and the 5829-long thread is rendered correctly. Collapsing the thread works instantly, but expanding it back repeats the process of 2-minute freeze.

Status: RESOLVED → REOPENED

Resolution: INCOMPLETE → ---

David :Bienvenu

Comment 19

•

17 years ago

that problem is about expanding a large thread, and I don't think it has to do with the subjects being the same at all - I think it's simply a problem expanding huge threads.

Nobody; OK to take it and work on it

Assignee

Updated

•

16 years ago

Product: Core → MailNews Core

Wayne Mery (:wsmwk)

Comment 20

•

16 years ago

(In reply to comment #19) > that problem is about expanding a large thread, and I don't think it has to do > with the subjects being the same at all - I think it's simply a problem > expanding huge threads. Is there a different bug for "expanding large thread" performance? (there may be, but I'm not finding one)

Keywords: perf

Wayne Mery (:wsmwk)

Comment 21

•

15 years ago

(In reply to comment #1) > This problem occurs on a folder where I have a large number of bounce messages > generated by a mail server that I admin. > > Currently the size of that folder is 162 MB, but the problem was significant > even when the folder had ~300 messages (that would amount to a couple of MBs > only) on a relatively fast hardware (Pentium III, 512MB RAM, UDMA 66 hard drive). > > See my comments in bug 172786 for details. not resolved?

Aleksander Adamowski

Reporter

Comment 22

•

15 years ago

Presently, I have no idea. I've reported the bug over 6 years ago, lots have changed. I'm unable to test it currently, since I have a different job and don't use IMAP mail anymore. If someone's still interested in fixing it (if not done already), the method to reproduce is to flood the mailbox with lots of identical small messages, then some amount of other messages, then trying to thread the message list. It there's noone interested, I have nothing against closing it with WORKSFORME (an additional resolution code: "OLD" might be handy in Bugzilla...).

WADA:World Anti-bad-Duping Agency

Updated

•

15 years ago

Blocks: 452221

WADA:World Anti-bad-Duping Agency

Comment 23

•

15 years ago

Problem cited in bug summary can be observed with local mail folder too. See bug 452221 comment #0 and description on attached test mails. However, it may depends on user's preference choice. As default prefs.js setting around threading was changed by Tb 3(mail.strict_threading, mail.thread_without_re, ...), prefs.js setting change may be needed to observe phenomenon with test mails attached to bug 452221.

David :Bienvenu

Updated

•

12 years ago

Assignee: dbienvenu → nobody

Wayne Mery (:wsmwk)

Updated

•

9 years ago

Summary: Slow threading when all messages have the same subject → Slow threading when all messages have the same subject, with mail.strict_threading

Wayne Mery (:wsmwk)

Comment 24

•

7 years ago

this wouldn't be linux only

OS: Linux → All

BMO Automation

Updated

•

2 years ago

Severity: normal → S3