Closed Bug 1386585 Opened 2 years ago Closed 2 years ago

saving/opening a large attachment takes long when message has abnormal MIME structure - multipart/mixed is not at the top level (and attachment sits in alternative part)

Categories

(MailNews Core :: MIME, defect)

defect
Not set

Tracking

(thunderbird_esr5255+ fixed, thunderbird56 fixed, thunderbird57 fixed)

RESOLVED FIXED
Thunderbird 57.0
Tracking Status
thunderbird_esr52 55+ fixed
thunderbird56 --- fixed
thunderbird57 --- fixed

People

(Reporter: berger, Assigned: jorgk)

References

Details

Attachments

(3 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

Steps to reproduce:

I've sent me an email with a bigger attachment around 20MB to my IMAP account.
I've disabled local copies whatsoever.
I've disabled junk and phishing checks
I've disabled the global search index.


Actual results:

Saving the file takes around 6 to 7 seconds on my machine. With Thunderbird 45 it was around 1-2 seconds. The problem is that saving an attachment blocks the GUI thread completely so one can't interact with TB any longer.

With iostat, lsof  and so on I was able to figure out that copying to /tmp/nsma-1 was very fast. After that the tmp file get's copied to it's final destination and it's this part which is very slow and cpu intensive while it was fast before (around TB 45).

Below you'll see the iostat output during the save action on TB56 and TB45.

Thunderbird 56.

          disk0       cpu     load average
    KB/t tps  MB/s  us sy id   1m   5m   15m
   88.44  18  1.55     0.00   0  0.00   9  6 86  2.32 1.92 1.90
  743.07  30 21.65     0.00   0  0.00  25  5 70  2.46 1.96 1.92
  344.67   6  2.01     0.00   0  0.00  26  3 70  2.46 1.96 1.92
  560.36  11  6.00     0.00   0  0.00  27  3 70  2.46 1.96 1.92
   97.04  94  8.87     0.00   0  0.00  16  8 75  2.46 1.96 1.92
  578.40   5  2.81     0.00   0  0.00  10  4 86  2.46 1.96 1.92

 
On Thunderbird 45: 
         disk0       cpu     load average
    KB/t tps  MB/s  us sy id   1m   5m   15m
  798.07  29 22.48  23  5 72  2.16 1.93 1.89
  177.35  92 15.87  22  4 74  2.06 1.91 1.89
    4.00   2  0.01   7  3 90  2.06 1.91 1.89
    5.00   4  0.02  19  9 72  2.06 1.91 1.89
 


Expected results:

It should be nearly as fast as before.

Maybe the download action can be put into a separate download thread. Visible in activities, without even blocking the gui thread.
Nice description of the problem.

What happened is that we've changed caching technology in TB 52. The max. cache size entry is 15 MB, and you're overflowing this with your 20 MB attachment.

So in the end, the attachment will be fetched from the server over and over and over again.

Further reading: Bug 1382008 comment #7 (quote):

To increase the limit, do this:
Tools > Options, Advanced, General, Config editor. Paste browser.cache.memory.max_entry_size into the search. Change value from 15000 to - say - 30000 (30 MB). Next paste browser.cache.memory.capacity into the search. Change value from 120000 to 240000. It needs to be eight times the other value.

I'm not sure what to do with this bug, maybe double the limit in general since people seem to consider 20 MB as a "normal" large size.
Summary: saving an attachment takes much longer and consumes much more CPU → saving a large attachment takes much longer and consumes much more CPU when attachment is larger that max cache size entry of 15 MB
Component: Untriaged → Networking: IMAP
Product: Thunderbird → MailNews Core
Attached patch 1386585-cache-size.patch (obsolete) — Splinter Review
Kent, I think we need to increase the limit here so that 20MB attachments can be handled better.
Assignee: nobody → jorgk
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8893024 - Flags: review?(rkent)
This only matters when someone decides not to store messages locally, right? And why do they do that? Presumably to save disk space.

Isn't the workaround here to allow that folder to work offline, so that the cache is not needed? And if that is unacceptable, why is it acceptable to increase the disk space usage of the cache? Aren't we working in a limited disk space scenario here?
(In reply to Kent James (:rkent) from comment #3)
> This only matters when someone decides not to store messages locally, right?
Right.

> And why do they do that? Presumably to save disk space.
The user in bug 1382008 comment #8 talked of a 6GB Gmail mail storage.

> Isn't the workaround here to allow that folder to work offline, so that the
> cache is not needed?
Sure, that's what I suggested.

> And if that is unacceptable, why is it acceptable to
> increase the disk space usage of the cache?
No, it's cached in memory during the session. However, I understand that M-C cache2 will page some of it out to disk. I don't know the details.

> Aren't we working in a limited disk space scenario here?
The question doesn't apply, see previous answer. However, some users do apparently not want to store e-mail locally.

Since the system starts to work very slowly once you hit the limit without any visible warning, see bug 1382008, all we can do right now is to increase the limit.
I'd like to give you a bit more background information about our environment.

We have one IMAP server with big SSD backed disk space plus an additional "ALTstore" backup space inside dovecote plus the usual backup mechanisms on tape. Essentially at least 3 copies. With email archiving four copies of all mails.

For the clients we have several big linux based terminal servers, on which our employers spawn their XFCE desktops through XDMCP or turbovnc based on their location (local or remote). Yes they work entirely on linux desktops. Right now we have about 50 desktops, that's about 50 thunderbird processes per server at the moment. The server is handling this perfectly. 
Additionally all thunderbird processes are cut off the internet. This is intentionally, to prevent tracking via images.

We don't want the desktops to pollute the local home directories with an additional copy of all theirs mails. It's just not required. Thats the reason why we want to index the mails but not to copy every email. It's already on our servers several times.
And our IMAP servers are never offline :-)


Today I'll test your workaround in our environment.

best regards,

dennis
Personally I don't think that fetching from IMAP server over and over again is the problem.
First thing is that fetching in general is very fast here. In under 1 second the file is there.
I've watched the traffic from exactly this thunderbird process and the traffic on the IMAP server itself with tcpdump and netstat during the save operation. The file gets downloaded and saved to /tmp/nsma-1 in about 1 second. see output at 13:06:57
After the download is complete the move operation to it's final destination is very slow and consumes a lot of cpu resources.
See output at 13:06:58

I've monitored the thunderbird process with pidstat and attached the output here. You can see the bytes written/sec, the timestamps, the cpu utilization, the stack and more.

This test was done on a fresh profile with the browser.cache.memory.* settings that you've described above. 


pidstat -r -s -u -p 4630 -v 1 > pidstat.txt

--------------



Linux 3.16.0-4-amd64 (app) 	03.08.2017 	_x86_64_	(12 CPU)

13:06:54      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:06:55     1113      4630    2,94    0,98    0,00    3,92     0  thunderbird

13:06:54      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:06:55     1113      4630      0,00      0,00 2260844 295216   0,52  thunderbird

13:06:54      UID       PID StkSize  StkRef  Command
13:06:55     1113      4630     144     128  thunderbird

13:06:54      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:06:55     1113      4630      0,00      0,00      0,00       0  thunderbird

13:06:54      UID       PID threads   fd-nr  Command
13:06:55     1113      4630      65      95  thunderbird

13:06:55      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:06:56     1113      4630    8,00    1,00    0,00    9,00     0  thunderbird

13:06:55      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:06:56     1113      4630      1,00      0,00 2260844 295216   0,52  thunderbird

13:06:55      UID       PID StkSize  StkRef  Command
13:06:56     1113      4630     144     128  thunderbird

13:06:55      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:06:56     1113      4630      0,00      0,00      0,00       0  thunderbird

13:06:55      UID       PID threads   fd-nr  Command
13:06:56     1113      4630      65      95  thunderbird

13:06:56      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:06:57     1113      4630   29,00    7,00    0,00   36,00     2  thunderbird

13:06:56      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:06:57     1113      4630   7112,00      0,00 2260844 295176   0,52  thunderbird

13:06:56      UID       PID StkSize  StkRef  Command
13:06:57     1113      4630     144     128  thunderbird

13:06:56      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:06:57     1113      4630      0,00  22264,00      0,00       0  thunderbird

13:06:56      UID       PID threads   fd-nr  Command
13:06:57     1113      4630      65      96  thunderbird

13:06:57      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:06:58     1113      4630   92,00    3,00    0,00   95,00     7  thunderbird

13:06:57      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:06:58     1113      4630   2127,00      0,00 2260844 303496   0,54  thunderbird

13:06:57      UID       PID StkSize  StkRef  Command
13:06:58     1113      4630     144     128  thunderbird

13:06:57      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:06:58     1113      4630      0,00   1824,00      0,00       0  thunderbird

13:06:57      UID       PID threads   fd-nr  Command
13:06:58     1113      4630      65      96  thunderbird

13:06:58      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:06:59     1113      4630   96,00    2,00    0,00   98,00     8  thunderbird

13:06:58      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:06:59     1113      4630   2265,00      0,00 2260844 312736   0,55  thunderbird

13:06:58      UID       PID StkSize  StkRef  Command
13:06:59     1113      4630     144     128  thunderbird

13:06:58      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:06:59     1113      4630      0,00   1952,00      0,00       0  thunderbird

13:06:58      UID       PID threads   fd-nr  Command
13:06:59     1113      4630      65      96  thunderbird

13:06:59      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:00     1113      4630   97,00    1,00    0,00   98,00     5  thunderbird

13:06:59      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:00     1113      4630   2219,00      0,00 2260844 321448   0,57  thunderbird

13:06:59      UID       PID StkSize  StkRef  Command
13:07:00     1113      4630     144     128  thunderbird

13:06:59      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:00     1113      4630      0,00   1920,00      0,00       0  thunderbird

13:06:59      UID       PID threads   fd-nr  Command
13:07:00     1113      4630      65      96  thunderbird

13:07:00      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:01     1113      4630   99,00    1,00    0,00  100,00     5  thunderbird

13:07:00      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:01     1113      4630   2292,00      0,00 2260844 330688   0,59  thunderbird

13:07:00      UID       PID StkSize  StkRef  Command
13:07:01     1113      4630     144     128  thunderbird

13:07:00      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:01     1113      4630      0,00   1984,00      0,00       0  thunderbird

13:07:00      UID       PID threads   fd-nr  Command
13:07:01     1113      4630      65      96  thunderbird

13:07:01      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:02     1113      4630   97,00    3,00    0,00  100,00     8  thunderbird

13:07:01      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:02     1113      4630   4786,00      0,00 2264940 340464   0,60  thunderbird

13:07:01      UID       PID StkSize  StkRef  Command
13:07:02     1113      4630     144     128  thunderbird

13:07:01      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:02     1113      4630      0,00   1856,00      0,00       0  thunderbird

13:07:01      UID       PID threads   fd-nr  Command
13:07:02     1113      4630      65      96  thunderbird

13:07:02      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:03     1113      4630   96,00    2,00    0,00   98,00     3  thunderbird

13:07:02      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:03     1113      4630   5201,00      0,00 2269036 303780   0,54  thunderbird

13:07:02      UID       PID StkSize  StkRef  Command
13:07:03     1113      4630     144     128  thunderbird

13:07:02      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:03     1113      4630      0,00   1568,00      0,00       0  thunderbird

13:07:02      UID       PID threads   fd-nr  Command
13:07:03     1113      4630      65      96  thunderbird

13:07:03      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:04     1113      4630   97,00    2,00    0,00   99,00    11  thunderbird

13:07:03      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:04     1113      4630   2274,00      0,00 2269036 313012   0,56  thunderbird

13:07:03      UID       PID StkSize  StkRef  Command
13:07:04     1113      4630     144     128  thunderbird

13:07:03      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:04     1113      4630      0,00   1984,00      0,00       0  thunderbird

13:07:03      UID       PID threads   fd-nr  Command
13:07:04     1113      4630      65      96  thunderbird

13:07:04      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:05     1113      4630   98,00    2,00    0,00  100,00     6  thunderbird

13:07:04      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:05     1113      4630   2313,00      0,00 2269036 322252   0,57  thunderbird

13:07:04      UID       PID StkSize  StkRef  Command
13:07:05     1113      4630     144     128  thunderbird

13:07:04      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:05     1113      4630      0,00   1984,00      0,00       1  thunderbird

13:07:04      UID       PID threads   fd-nr  Command
13:07:05     1113      4630      65      96  thunderbird

13:07:05      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:06     1113      4630   72,00    4,00    0,00   76,00     1  thunderbird

13:07:05      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:06     1113      4630   3608,00      1,00 2312064 330912   0,59  thunderbird

13:07:05      UID       PID StkSize  StkRef  Command
13:07:06     1113      4630     144     128  thunderbird

13:07:05      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:06     1113      4630      8,00   1164,00      0,00       0  thunderbird

13:07:05      UID       PID threads   fd-nr  Command
13:07:06     1113      4630      73      94  thunderbird

13:07:06      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:07     1113      4630    0,00    0,00    0,00    0,00     1  thunderbird

13:07:06      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:07     1113      4630      3,00      0,00 2287476 330912   0,59  thunderbird

13:07:06      UID       PID StkSize  StkRef  Command
13:07:07     1113      4630     144     128  thunderbird

13:07:06      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:07     1113      4630      0,00      4,00      0,00       0  thunderbird

13:07:06      UID       PID threads   fd-nr  Command
13:07:07     1113      4630      66      94  thunderbird

13:07:07      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:08     1113      4630    2,00    0,00    0,00    2,00     5  thunderbird

13:07:07      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:08     1113      4630     94,00      0,00 2287476 331056   0,59  thunderbird

13:07:07      UID       PID StkSize  StkRef  Command
13:07:08     1113      4630     144     128  thunderbird

13:07:07      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:08     1113      4630      0,00      4,00      0,00       0  thunderbird

13:07:07      UID       PID threads   fd-nr  Command
13:07:08     1113      4630      66      94  thunderbird

13:07:08      UID       PID    %usr %system  %guest    %CPU   CPU  Command
13:07:09     1113      4630    0,00    0,00    0,00    0,00     9  thunderbird

13:07:08      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
13:07:09     1113      4630      0,00      0,00 2287476 331056   0,59  thunderbird

13:07:08      UID       PID StkSize  StkRef  Command
13:07:09     1113      4630     144     128  thunderbird

13:07:08      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
13:07:09     1113      4630      0,00      0,00      0,00       0  thunderbird

13:07:08      UID       PID threads   fd-nr  Command
13:07:09     1113      4630      66      94  thunderbird


Durchschn.:   UID       PID    %usr %system  %guest    %CPU   CPU  Command
Durchschn.:  1113      4630   58,99    1,93    0,00   60,92     -  thunderbird

Durchschn.:   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
Durchschn.:  1113      4630   2283,29      0,07 2271497 317161   0,56  thunderbird

Durchschn.:   UID       PID StkSize  StkRef  Command
Durchschn.:  1113      4630     144     128  thunderbird

Durchschn.:   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
Durchschn.:  1113      4630      0,53   2563,78      0,00       0  thunderbird

Durchschn.:   UID       PID threads   fd-nr  Command
D
Thanks for the insight into your operation and the analysis. In general it's better to attach logs as text files (text/plain) and only quote the relevant parts.

I'm not a Linux expert, so with "see output at 13:06:57" I don't even know what to look for. "13:06:57" appears 12 times in your log.

Basically you're saying that increasing the limits didn't help at all, right?

The attachment is fetched and stored in /tmp/nsma-1. I'd have to look through the code to see how we process data coming from the IMAP server, but I think we hand it over to the cache. So does that get stored in a file in the cache storage which lives in a directory "cache2"? You can set browser.cache.disk.parent_directory to determine where the cache goes, "cache2" is a subdirectory of that.

So once the message has fully loaded, saving the attachment is slow, so basically getting the data from the cache and writing it to another disk file.

Once you can confirm this, then I could ask the author of the cache code for advice.
You're right about the attachment. Actually I was looking for exactly that but missed the part that it can be done above.

Yes it didn't help at all. Yes one need to be familiar with the pidstat output to understand.

To be honest I don't think it's the cache at all, during the tests I completely disabled all *.cache.enable settings including disk and memory parts. The behavior didn't change at all. 

I've the suspicion, that in TB52 the tmpfile is handled differently afterwards, like always copying to a bytebuffer even if cache is disabled. Just with a NOOP afterwards and writing out of the bytebuffer to a file.

Nevertheless I will try your suggestion next Tuesday.

best,
-dennis
I still don't understand which "temp file" you're talking about. As I said, the data received from the IMAP connection is handed over to the cache which may or may not store it into a file. What is the exact name/path of that temp file?

I know that during mail composition/sending there are a few temp files involved, but I'm not aware that there is a temp file when receiving a multipart message via IMAP. But I might be wrong as I haven't had time to look at it. Again, the name/path of that temp file could make my investigation easier.

Another question: You're talking about saving the attachment. What about opening/viewing it, does that have the same delay?

And everything was faster in TB 45, right?
The temp file I meant is "/tmp/nsma-1" or "/tmp/nsma". 
What happens during the save is this, the attachment gets downloaded to "/tmp/nsma*" which is very fast. After that the actual move operation to the final destination is slow and very cpu intensive. With all caches disabled there is still this temporary move operation. Opening and viewing has the same delay yes.

In TB45 it is a lot faster, yes. The behavior basically is this. It gets downloaded in 1 sec to /tmp/nsma and in under 1 sec to move the final to final destination and it's not cpu intensive.
Hmm, and that temp file contains the attachment, you've verified that, I assume.

I'd have to dig through the code to see where we store that. I'm a bit surprised since the temp files I know are called nsmail.tmp, nsemail.html and nscopy.tmp. Nothing without a file extension.

I also can't see "nsma" in out source tree.

The only thing I can think of right now is the AV integration:
Tools > Options, Security, Antivirus
also
Tools > Options, Advanced, General: Allow Windows Search to search messages.
But then you're on Linux and that option doesn't exist.

Aceman, have you come across temp files "/tmp/nsma" on Linux?
Flags: needinfo?(acelists)
Hmm thanks for the hint. I was able to copy away the tempfile from /tmp/nsma and took a look at it.
It's the multimime attachment part in base64 encoded form. So what's happening and what's more time consuming could be the decoding part.
Sorry, here the terminology has gone astray: "multimime attachment part in base64 encoded form". Multimime?

You mean there you see a base64 encoded attachment part, something like (only much larger)?

Content-Type: image/png;
 name="lijbmghmkilicioj.png"
Content-Transfer-Encoding: base64
Content-ID: <part1.1BFAF2AE.F1EC3CDA@jorgk.com>
Content-Disposition: inline;
 filename="lijbmghmkilicioj.png"

iVBORw0KGgoAAAANSUhEUgAAACoAAAAZCAIAAADITysPAAAAeklEQVRIie3SwQ2AIAyFYTZT
CkdbhlIBt3YBi3kJEUWS/8bhS0uN50xLgvKcoZyeKT9fNvh+eC8H1sd4iVrGNeYlOokUgNBb
KS0ftfviiffGvBWsyssf/At59fok12rw6N9bTlA3vDZlY/6h5f+YJ16dbFDoqU5Kc4gn32d6
E1RB/bUAAAAASUVORK5CYII=

Or do you see the entire multipart message in there? With all message headers?

As I said various times before, I'm surprised those temp files exist. They are surely created when composing/sending a message, but to my knowledge not when reading one.

Since you seem to be a friend of logging, maybe try IMAP logging to see whether that gives us any hints:
https://wiki.mozilla.org/MailNews:Logging#Main_module_options_within_MailNews
Make sure you get the log file name right, see bug 1371498 comment #4.

There is also cache logging, but that gets very wordy and so far we're pursuing the theory that the cache isn't involved here.

What about the AV option from comment #11, have you checked that?
OK, On Windows I did that:
I set the cache entry size to 2 MB and downloaded (via IMAP of course) a message with a 5 MB embedded-image: No temporary file created.
I downloaded a message with a 7 MB PDF attachment. When opening the attachment I get a temporary file, like eidE9afC.pdf.part, which comes from the M-C download manager, I believe. When I chose to save the file, I can see the attachment in the temp directory. I don't see any base64-encoded message parts in the temp directory anywhere.
To be clear. I meant the base64 encoded attachment part. Exactly like the one you've posted above only much larger.

AV is not set at all.

I can do a test with logging next Tuesday.
(In reply to Dennis Berger from comment #15)
> To be clear. I meant the base64 encoded attachment part. Exactly like the
> one you've posted above only much larger.
OK, so a message part, with its header and content, which for binary attachments (images, PDF, etc.) will be base64 encoded.

I've done some more digging, and finally found the code that writes "nsma" files!
https://dxr.mozilla.org/comm-central/search?q=regexp%3AnsMsgCreateTempFile.*nsma&redirect=false
mailnews/mime/src/mimemrel.cpp
mailnews/mime/src/mimepbuf.cpp

It sits in our ancient MIME module at has very little to to with IMAP.

The code in mimepbuf.cpp is used for multipart/alternative messages, as we read here:
https://dxr.mozilla.org/comm-central/rev/13e8fa67e60a3180c22702527f6f1079f290ce90/mailnews/mime/src/mimepbuf.h#19
mimepbuf.cpp has not changed since TB 38.
Multipart alternative is used where the message contains both a plain text and a HTML body.

The code in mimemrel.cpp deals with multipart/related messages. The are used, for example, for HTML message with embedded images. There is only one change in mimemrel.cpp between TB 38 and TB 52 which has to do with buffer size when reading a file, I'm not sure which:
https://hg.mozilla.org/comm-central/rev/6e8535df0656#l20.23 in bug 558528. Before a 10KB buffer was used, now a 16KB buffer is used. I don't see how that would drive down performance from 1-2 sec to 6-7 sec.

Attachments usually sit in multipart/mixed messages, where the message body is one of plaintext, HTML or multipart/related or multipart/alternative.

As we now know, those "nsma" files contain message parts, so they are subject to further processing, that is, decoding, before the attachment can be saved or opened. What you said in comment #0 "After that, the tmp file gets copied to it's final destination and it's this part which is very slow and cpu intensive while it was fast before" does not reflect reality, as the temp file isn't copied to it's final destination, but in fact read and decoded and the result gets copied to the final destination. We're on the wrong track looking for a problem in IMAP or caching, since as you confirmed, increasing the cache limits makes no difference.

We would have to see where the performance loss happens. Now that we know that it has nothing to do with IMAP, please supply us with a "slow" message, for further analysis. Please e-mail that to me (e-mail can be found in my profile) or upload it to a file hoster, like WeTransfer, if it's around 20 MB.
Summary: saving a large attachment takes much longer and consumes much more CPU when attachment is larger that max cache size entry of 15 MB → saving a large attachment takes much longer and consumes much more CPU - performance loss when processing MIME-written message part file "nsma"
Comment on attachment 8893024 [details] [diff] [review]
1386585-cache-size.patch

Nothing to do with cache.
Attachment #8893024 - Attachment is obsolete: true
Attachment #8893024 - Flags: review?(rkent)
Thanks I've sent you a "slow" message via mail.
It originated from Apple Mail with a very strange MIME structure:

  multipart/alternative
    text/plain
    multipart/mixed
      text/html
      large ZIP file attachment
      text/html

It well possible that TB is slow on this message. So my question: Is saving always slow, or just with certain messages like this one:

The correct structure should be:

  multipart/mixed
    multipart/alternative
      text/plain
      text/html
    large ZIP file attachment

I'll reshuffle the message and you tell me whether the result is also slow.
Flags: needinfo?(acelists)
Opening the attachment in the original message: 15 sec,
opening the attachment in the reshuffled message: 2 sec.

Complain to Tim Cook. Apple can't even put together a decently structured message :-(

Basically the same as bug 1362539, see bug 1362539 comment #3.
Status: ASSIGNED → UNCONFIRMED
Ever confirmed: false
See Also: → 1362539
Summary: saving a large attachment takes much longer and consumes much more CPU - performance loss when processing MIME-written message part file "nsma" → saving/opening a large attachment takes long when message has abnormal MIME structure - multipart/mixed is not at the top level (and attachment sits in alternative part)
TB 45 opens the attachment in the original message in 6 sec, so the performance has gone down. We've changed a few things in the processing of multipart/alternative messages, for example prioritising the parts differently now, so it's well possible that some slowness was introduced. The reshuffled message opens very fast in TB 45.

Alice, can you please check when opening the attachment has become slow? I'll send you the test message via WeTransfer.

I suspect bug 574989 or bug 253830.
Flags: needinfo?(alice0775)
Thank you so much Alice, as always.

This comes from bug Bug 1146099, https://hg.mozilla.org/comm-central/rev/32170e6fb298, where I replaced using a C++ function with using a JS function, so a typical performance regression.

My debug build takes about a minute to open the attachment, with the change backed out it's 7 seconds.

decodeMimeHeader
https://dxr.mozilla.org/comm-central/rev/a5260cd23f61dbda6185636334fb2e14c31f30f8/mailnews/mime/src/mimeJSComponents.js#487
might be slow and the regular expression processing:
    if (aUnfold) {
      value = value.replace(/[\r\n]\t/g, ' ')
                   .replace(/[\r\n]/g, '');
    }
won't be quick either. I looked and we always call this with aUnfold==true so strings are duplicated around in JS.

We'd have to investigate further how to speed up this code and why a decode header function seems to be called on an entire (huge) part rather than just some headers.

To be continued.
Blocks: 1146099
Status: UNCONFIRMED → NEW
Ever confirmed: true
OK, here's some debugging with the large attachment of about 300.000 lines of base64 encoding.

In the message with the good structure, MIME_DecodeMimeHeader() gets called 588.000 times to decode the file name header:
=== MIME_DecodeMimeHeader |SWI9200M_3.5-Release8-SWI9200X_03.05.24.00.zip|
but the save dialogue comes up after 162.000 invocations.

In the message with the bad structure, MIME_DecodeMimeHeader() also gets called 588.000 times, but the save dialogue comes up after those 588.000 invocations.

Somehow the later invocations take more time.

Anyway, something is wrong here, there is no need to decode this header more than once, or perhaps if various parts of the system don't talk well to one another, a few times, but 588.000 times?

Reducing the 300.000 lines of base64 to 20 still shows 40+ invocations, so two invocations per line of encoded attachment. Sigh, that simply doesn't make sense at all.

Looks like we can speed up opening any attachment.

To be continued.
Attached patch 1386585-mime-decode.patch (v1). (obsolete) — Splinter Review
Alfred, with this patch, large attachments open instantly.

I left the debug in so you can see how many times MIME_DecodeMimeHeader() is called.

I'll send you the test messages via WeTransfer and I'll attach reduced examples.
Attachment #8894281 - Flags: review?(infofrommozilla)
Attachment #8894283 - Attachment mime type: message/rfc822 → text/plain
Component: Networking: IMAP → MIME
OK, I improved the hacky version and moved getting the name into MimeOptions_write where it is (sometimes) needed.
Attachment #8894281 - Attachment is obsolete: true
Attachment #8894281 - Flags: review?(infofrommozilla)
Attachment #8894365 - Flags: review?(infofrommozilla)
Comment on attachment 8894365 [details] [diff] [review]
1386585-mime-decode.patch (v2).

Review of attachment 8894365 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/src/mimei.cpp
@@ +1705,5 @@
>        if (lstatus < 0) return lstatus;
>  
> +      nsCString name;
> +      name.Adopt(MimeHeaders_get_name(hdrs, opt));
> +      MimeHeaders_convert_header_value(opt, name, false);

Thanks, this looks better to me. This case when the function really needs the name, how often does that happen per message? Is it 1-2 times for each message, or are there possible messages where it is still needed for every line (just not the message in the test case)? It seems this is just writing out the name of the attachment in some formatting so it should be once per message?
(In reply to :aceman from comment #29)
> Thanks, this looks better to me.
Well, if you like it, make it r+. It's really only refactoring, no logic changes.

MimeHeaders_get_name(hdrs, opt) is quite expensive, since it searches headers for anything that could be a name. The convert/decode is even more expensive. If you look at MimeOptions_write() you will see that the name is only used to write
<FIELDSET CLASS="mimeAttachmentHeader">
<LEGEND CLASS="mimeAttachmentHeaderName">
which to my knowledge happens when making a page with inline attachments for display/printing. It does not happen for non-inline attachments, like PDFs or ZIP files, like in this case.

> This case when the function really needs
> the name, how often does that happen per message?
Once per attachment, or not at all for non-visible attachments. In the case of the ZIP files, the underlying MIME_DecodeMimeHeader() isn't called at all any more, so from an expensive 588.000 calls down to nothing.

I expect *huge* performance gains everywhere from this simple change.
Comment on attachment 8894365 [details] [diff] [review]
1386585-mime-decode.patch (v2).

One review will do, whoever comes first. Green try here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=05e1b8007ab221b9ca2443d660b31bb72679b47f

The usual Z failures from this list: https://mzl.la/2lrQTHC
Attachment #8894365 - Flags: review?(acelists)
Comment on attachment 8894365 [details] [diff] [review]
1386585-mime-decode.patch (v2).

Review of attachment 8894365 [details] [diff] [review]:
-----------------------------------------------------------------

Great, the removal of the calls is seen with any attachment.
Good find guys.

::: mailnews/mime/src/comi18n.cpp
@@ +21,5 @@
>  void MIME_DecodeMimeHeader(const char *header, const char *default_charset,
>                             bool override_charset, bool eatContinuations,
>                             nsACString &result)
>  {
> +  printf("=== MIME_DecodeMimeHeader |%s|\n", header);

I assume this debugging will not land.
Attachment #8894365 - Flags: review?(acelists) → review+
Did it land on try? ;-)
Attachment #8894365 - Flags: review?(infofrommozilla)
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/406683192623
MIME: Obtain and decode name header only when required. r=aceman
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 57.0
Comment on attachment 8894365 [details] [diff] [review]
1386585-mime-decode.patch (v2).

MIME refactoring with a huge performance gain.
Attachment #8894365 - Flags: approval-comm-esr52?
Attachment #8894365 - Flags: approval-comm-beta+
I've tested the nightly build from yesterday and I can confirm that the problem is gone now.
I've also tested big attachments from different MUAs. I've no delay whatsoever.

Thanks to all involved for fixing this bug :-)
Attachment #8894365 - Flags: approval-comm-esr52? → approval-comm-esr52+
You need to log in before you can comment on or make changes to this bug.