Closed Bug 1579341 Opened 1 year ago Closed 5 months ago

Extra imap fetch occurs for non-inline attachments instead of fetch only on demand.

Categories

(MailNews Core :: Attachments, defect)

defect
Not set

Tracking

(thunderbird_esr6869+ fixed, thunderbird70 fixed, thunderbird71 fixed)

RESOLVED FIXED
Thunderbird 71.0
Tracking Status
thunderbird_esr68 69+ fixed
thunderbird70 --- fixed
thunderbird71 --- fixed

People

(Reporter: gds, Assigned: alta88)

References

(Regression, )

Details

(Keywords: perf, regression)

Attachments

(3 files, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #1454542 comment 29 +++

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
Build ID: 20180327091359

I have a message sitting in a folder with no offline store. The message is very simple with a text only body, a zip file attachment and a large text file attachment. I have view attachments inline turned off and the fetch on demand for attachments enabled and the attachment files are larger than the fetch on demand threshold (30k bytes).

With ram cache empty (after tb startup or after clearing cache in advanced setting) and clicking on the message, the body mime part (just 14 bytes) is fetched and both attachments are fetched. The body is stored to a cache entry and the attachments are also stored, each to a separate cache entry. The cache key for the body part starts with imap://username@host... but the attachment cache keys both leave off the username and are just imap://host...

So two question arise at this point. Why are the non-inline attachments fetched at all at this time and why do the URL's for the attachments not also have a username?

Now the message displays correctly with the short body text and the two links for the attachments. If the zip attachment is clicked, which should be in cache, it is fetched again. This is because the URL for fetching it now contain the username, imap://username@host... so this "key" is not found in cache and a new entry is written and the zip file can now be accessed using an external viewer. The same thing occurs when the text file attachment link is clicked.

Subsequent accesses to this message and its attachments are obtained from cache since now all parts have a valid entry key in cache.

On a fast network this extra activity may not be noticed.

This is observed with the patch for bug 1454542 rebased and applied to the current trunk and also without the patch applied to trunk. This does not occur in ESR 60.

Based on information provided by alta88, this seems to be a regression of bug 1345167. I have also checked that backing out the changes associated with bug 1345167 fixes the problem.

Keywords: regression
Regressed by: 1345167
Summary: [regression] Extra imap fetch occurs for non-inline attachments instead of fetch only on demand. → Extra imap fetch occurs for non-inline attachments instead of fetch only on demand.
Regressed by: 1454542

Sorry for the noise, pasted wrong while looking at bug 1454542 comment #32.

No longer regressed by: 1454542

The cache key for the body part starts with imap://username@host...

Should that just be stripped? How is the username relevant for the cache key?

(In reply to Jorg K (GMT+2) from comment #2)

The cache key for the body part starts with imap://username@host...

Should that just be stripped? How is the username relevant for the cache key?

I agree, and not just for the cache key (though that would be the first step), credentials were a thing 20yrs ago when this code was first written, they have been deprecated from fetch() for good reasons. To get extra perspective, perhaps our resident security/privacy person can opine on credentials in urls and whether they should be removed from imap/news urls when making a network request.

Flags: needinfo?(kaie)

Without the account user name in the URL, if you have two accounts on the same server, the cache key could be ambiguous. E.g., if you have message uid 1000 on two gmail accounts and both are to be put into ram cache, you need the userid (or something else unique associated with the account) in the url string (cache key) to distinguish them.

But the main problem is that I am seeing the URLs for the two attachments (without the username) along with the main message URL (which includes the username). I should only see the main message URL when message is opened and the attachments are not inline. At least that's how it used to work. By "seeing the URL" I am referring to the URL received and processed here:
https://searchfox.org/comm-central/rev/90d756d83214006e9ddc9e6de13dc138bd478af3/mailnews/imap/src/nsImapProtocol.cpp#8822

(In reply to gene smith from comment #4)

Without the account user name in the URL, if you have two accounts on the same server, the cache key could be ambiguous. E.g., if you have message uid 1000 on two gmail accounts and both are to be put into ram cache, you need the userid (or something else unique associated with the account) in the url string (cache key) to distinguish them.

I think if we control the cache then the unique key can include anything, like nsIMsgIncomingServer.key. But Jorg is the expert on both cache2 and mailnews urls ;)

But the main problem is that I am seeing the URLs for the two attachments (without the username) along with the main message URL (which includes the username). I should only see the main message URL when message is opened and the attachments are not inline. At least that's how it used to work. By "seeing the URL" I am referring to the URL received and processed here:
https://searchfox.org/comm-central/rev/90d756d83214006e9ddc9e6de13dc138bd478af3/mailnews/imap/src/nsImapProtocol.cpp#8822

The UI really wants to show the user whether there is a valid attachment (to show size and set action button enable state and so forth). If we want parts on demand to override that (if it's not in found cache) and not query the server, then the js code can be altered to not do that. It would be useful to stick a debug in the isEmpty() function. If the sink gets a -1 from libmime then we decide to check and don't consider parts on demand. The comments in isEmpty() are reasonably clear, I hope.

I think I am wrong about the url. If you have multiple accounts on the same server I think the additional accounts will have a numerical suffix on the server name that appears in the url, e.g.,
imap.gmail.com
imap.gmail.com-1
imap.gmail.com-2
I still need to verify this but, if so, the username can be striped from a url in all cases.

I will also test the isEmpty() later on as requested in comment 5.

(In reply to gene smith from comment #6)

I think I am wrong about the url. If you have multiple accounts on the same server I think the additional accounts will have a numerical suffix on the server name that appears in the url, e.g.,
imap.gmail.com
imap.gmail.com-1
imap.gmail.com-2
I still need to verify this but, if so, the username can be striped from a url in all cases.

I was actually right about the host name component of the URLs. With multiple accounts the only difference in the URLs is the user name. E.g., from my two gmail accounts from the IMAPCache log:

D/IMAPCache OpenCacheEntry(): For URL = |imap://w4%2Ek9%2Evws%40gmail%2Ecom@imap.gmail.com:143/fetch%3EUID%3E/Junk%3E1|
D/IMAPCache OpenCacheEntry(): For URL = |imap://gd%2Esmth%40gmail%2Ecom@imap.gmail.com:143/fetch%3EUID%3E/INBOX%3E6222|

There is no number suffix. So to distinguish the cache entries when the folder name and UID are the same, without other URL changes, you need the username too.

Attached file dump.diff

(In reply to alta88 from comment #5)

But the main problem is that I am seeing the URLs for the two attachments (without the username) along with the main message URL (which includes the username). I should only see the main message URL when message is opened and the attachments are not inline. At least that's how it used to work. By "seeing the URL" I am referring to the URL received and processed here:
https://searchfox.org/comm-central/rev/90d756d83214006e9ddc9e6de13dc138bd478af3/mailnews/imap/src/nsImapProtocol.cpp#8822

The UI really wants to show the user whether there is a valid attachment (to show size and set action button enable state and so forth). If we want parts on demand to override that (if it's not in found cache) and not query the server, then the js code can be altered to not do that. It would be useful to stick a debug in the isEmpty() function. If the sink gets a -1 from libmime then we decide to check and don't consider parts on demand. The comments in isEmpty() are reasonably clear, I hope.

As can be seen in the dump.diff attachment, I added some dump() calls where -1 is mentioned as libmime return value at several places. There are some in isEmpty() and other places. The results are interleaved with the IMAPCache log outputs that can be seen in the next attachment to this bug. I'm not 100% sure this is what you are requesting but it looks like interesting information. (More details in next comment.)

Attached file cache.txt

This is the IMAPCache log outputs along with the dump() calls I just added. It show the action I took and then the log and dump lines that printed.

I think maybe a problem that's causing the redundant fetches with 68.0 is that the mime headers don't contain a "Content-Length" field. So to show the attachment lengths when the message is opened, the 68.0 code now triggers a fetch of the attachments to get the length information. However, even after doing this, the attachment size shown is somewhat in error (see the attached cache.txt file).

With 60.8 and earlier, the attachments size just shows as unknown and a attachment imap fetch doesn't occur until the attachment is clicked and opened.

Just briefly on the username. I was wrong with asking to strip it, it's always been part of the URL and hence the cache key. Just pick any bug dealing with imap: URLs, like bug 1564094 comment #0. I haven't had time to read the other comments in detail, but we should check how the cache keys without user name are caused, or Gene is already on the way there. I'll clear the NI for Kai. All this is internal, and as long we don't print passwords onto some log, we should be cool (plus, Kai has a lot of other stuff on his plate).

Flags: needinfo?(kaie)

No, you were right about removing username, let's fight "it's always been this way"-ism. It was an architectural mistake. It has lead to how many confusions of why there's a nsIMsgIncomingServer.username and .realUsername, and now fetch() can't be used unless the mailnews url is changed to having a server.key in the url query part to guarantee uniqueness in the cache.

I don't know if there's an initiative to renovate mailnews urls but there should be. Instead of string manipulation of query parts, they should be get/set in the mutate() way and there should be no credentials in the url. The alternative is reading messages using xpcom nsIBinaryInputStream or related stuff which is not web technology and will be dead eventually so of course we'll fork it.

As far as what should be done regarding getting information about attachments not-on-demand, it seems there are more bugs about "why is there a false attachment icon" than "do not display size because that's network traffic".

As for the false sizes, the intent in the original code wasn't to get the size but to check that it was at least 1, so the fetch() code just gets the first chunk and doesn't drain the input stream and the error is that it shouldn't pretend that's the size to the ui (unlike for external attachments, http and file, which should have content-length with a HEAD and will with a GET).

But the root cause of the "problem" (change in previous behavior) is boy scoutism. The misleading comment "libmime never managed to figure it out" leads to a boy scout saying "Ok maam I'll help you cross the street!" But in fact maam doesn't want to cross the street. A -1 really means parts_on_demand only.

I'll submit a fix soon, which will exclude fetching a url with credentials (imap only I believe; news does not have credentials, maybe snews does). Gene, it would be useful to run your tests against 1)local folders/pop accounts and 2)news accounts. And also please attach a patch with the cache debug code.

I'll submit a fix soon, which will exclude fetching a url with credentials (imap only I believe; news does not have credentials, maybe snews does).

Do you mean include credentials, and just username and, of course, not password? That's what is needed to help with the problem. However, when you first open a message with parts on demand we (actually OpenCacheEntry()) only want to receive the top level "fetch" url and not the fetch url for the parts like it's doing now.

Gene, it would be useful to run your tests against 1)local folders/pop accounts and 2)news accounts.

There is no cache that I know of for local, pop or news but I could be wrong. We only currently have cache debug code for IMAP stuff to avoid accessing the network on every click.

And also please attach a patch with the cache debug code.

Are you asking me to add cache debug code for pop/local/news? If so, and if possible, sounds like a big job!

(In reply to gene smith from comment #12)

I'll submit a fix soon, which will exclude fetching a url with credentials (imap only I believe; news does not have credentials, maybe snews does).

Do you mean include credentials, and just username and, of course, not password? That's what is needed to help with the problem. However, when you first open a message with parts on demand we (actually OpenCacheEntry()) only want to receive the top level "fetch" url and not the fetch url for the parts like it's doing now.

No, I mean the js code will not attempt to fetch any size with a -1, and will not resolve size for any attachment url with credentials, which afaict is only imap. So it will be reverted to how it was before regarding parts on demand.

Gene, it would be useful to run your tests against 1)local folders/pop accounts and 2)news accounts.

There is no cache that I know of for local, pop or news but I could be wrong. We only currently have cache debug code for IMAP stuff to avoid accessing the network on every click.

I noticed an OpenCacheEntry() in nntp, not sure why the others would have it but.. I don't think there's an on demand option for nntp.

And also please attach a patch with the cache debug code.

Are you asking me to add cache debug code for pop/local/news? If so, and if possible, sounds like a big job!

No, I thought you made a patch with debug; never mind, I saw it's from a built in MOZ_LOG, I hand rolled debug for dbView code...

Ok, so I will just try your patch when you have it ready.

I noticed an OpenCacheEntry() in nntp, not sure why the others would have it but.. I don't think there's an on demand option for nntp.

Sure, we cache the result, so we don't have to go to the server again. Local folders, incl. messages downloaded via POP, are already "cached" on a local disk.

Sadly, even though parts on demand is on, and an attachment has been demanded and downloaded and cached, libmime doesn't check the cache and on reselect of that message sends up no size, thus reducing the value of the cache. Although the attachment is gotten from the cache on an open/save request.

Attached patch fetch.patch (obsolete) — Splinter Review

Gene, please give this a try.

Attachment #9092140 - Flags: feedback?(gds)
Comment on attachment 9092140 [details] [diff] [review]
fetch.patch

This needs to go into 68.2/esr.
Attachment #9092140 - Flags: review?(jorgk)
Blocks: 1576584

(In reply to alta88 from comment #17)

Gene, please give this a try.

I will try this patch a bit later today. Sorry for the delay.

(In reply to alta88 from comment #17)

Created attachment 9092140 [details] [diff] [review]
fetch.patch

Gene, please give this a try.

I applied the patch and it fixes the problem. Not seeing the username-less URLs for the attachments when the full message URL is received. So this fixes the immediate problem.

However as alta88 points out, with attachments not inline and fetch attachments on demand, the length indicates size unknown for the attachments.:

> 2 attachments size unknown

When I click the attachments expansion widget I see no sizes:

test.zip a-even-smaller.txt

In this mode the attachments are properly fetched and stored to ram cache only when clicked (opened).

With attachments set to view inline and still fetch on demand, I see:

>2 attachments 13.4MB

and when I click the attachments expansion widget, I see individual sizes:

test.zip 38.1 KB a-even-smaller.txt 13.4 MB

Both attachments are not viewable inline (even the .txt due to a recent libmime change by aceman Bug 1509709) so not sure why the whole message is fetched in this mode. It should behave exactly like when attachments are not viewed inline. But this is not really related to this bug but to Bug #1454542

Thanks for the past and current very thorough qa testing..

(In reply to gene smith from comment #20)

(In reply to alta88 from comment #17)

Created attachment 9092140 [details] [diff] [review]
fetch.patch

Gene, please give this a try.

I applied the patch and it fixes the problem. Not seeing the username-less URLs for the attachments when the full message URL is received. So this fixes the immediate problem.

However as alta88 points out, with attachments not inline and fetch attachments on demand, the length indicates size unknown for the attachments.:

> 2 attachments size unknown

When I click the attachments expansion widget I see no sizes:

test.zip a-even-smaller.txt

A message of "size unknown" leads to user confusion. What, you can't figure out the size? The size is certainly knowable, it's just that parts_on_demand prevents that. And since it is the default, without UI for a user to learn what it means and make a decision, the client looks not smart. If it were up to me, this UX would be changed.

In this mode the attachments are properly fetched and stored to ram cache only when clicked (opened).

With attachments set to view inline and still fetch on demand, I see:

>2 attachments 13.4MB

and when I click the attachments expansion widget, I see individual sizes:

test.zip 38.1 KB a-even-smaller.txt 13.4 MB

Both attachments are not viewable inline (even the .txt due to a recent libmime change by aceman Bug 1509709) so not sure why the whole message is fetched in this mode. It should behave exactly like when attachments are not viewed inline. But this is not really related to this bug but to Bug #1454542

I haven't looked, but it's reasonable to guess that "view inline" overrides parts_on_demand being true. And it should, as that's what the user expects.

I haven't looked, but it's reasonable to guess that "view inline" overrides parts_on_demand being true. And it should, as that's what the user expects.

I'm pretty sure that if the attachment is only a large audio file (e.g., 10MB) and you have "view inline" enabled and you have no offline store, only the body of the message will be fetched when you select it. At least that's how I would expect it to work. When you open the attachment, only then is the audio file fetched (and stored to ram cache).

Right, I meant for attachments which are inline candidates, which audio is not, unlike image or other mimetypes, as shown in Bug 1509709.

In my test email there is a zip file and text file (see comment 20). Neither display inline even the text (assuming the pref to allow text inline is not set as added with Bug 1509709). Therefore, there is no reason to fetch either mime part (assuming parts on demand and no offline storage) regardless of how "view inline" is set.
In my test file the text is actually a lot of lines (one char per line and the file is >13M). So this brings tb to its knees when displayed inline. Also, fetching the 13M is no more necessary than fetching the much smaller zip,neither of which are are displayed inline. The huge text file should only be fetched when clicked/opened into a text editor.
I think the problem is in nsIMAPBodystructure.cpp. It still thinks text is displayed inline so it still generates the imap fetch command. Not sure why zip is being fetch as a part too.
But, sorry, this discussion probably completely OT to this bug but relevant to Bug #1454542.

Duplicate of this bug: 1581620

Looking at some new bugs, I just found duplicate bug 1581620. Will bug 1576584 also be fixed here?

Yes, I think so according to alta88; see bug 1581553 comment 11

Comment on attachment 9092140 [details] [diff] [review]
fetch.patch

Gene, I assume you wanted to give f+ here, right?

Magnus, you reviewed the original attachment changes in bug 1345167. My availability is limited this week (as you know) and it would be good to get this into TB 70 beta 2 which we're planning for tomorrow or the day after. I don't have time for testing.

BTW, the console.log() statements would have to go, but I can do that.
Attachment #9092140 - Flags: review?(mkmelin+mozilla)

Some notes: the fetch method must be GET, as link url servers do not have to advertise content-length and I've now seen this in the wild. It doesn't matter, as for links the fetch is only on user demand, for local files it's a non issue, imap won't go through this code until its urls no longer have credentials and now parts_on_demand is handled, and for news parts_on_demand isn't honored so we get the size from libmime anyway. At least that's my understanding of it all.

Regarding comment 11 and imap url renovation, no better example than Bug 1580262.

Comment on attachment 9092140 [details] [diff] [review]
fetch.patch

Review of attachment 9092140 [details] [diff] [review]:
-----------------------------------------------------------------

Gets rid of the regression. I didn't review the code at all, just applied the patch and ran a test.
Attachment #9092140 - Flags: feedback?(gds) → feedback+

I do have a somewhat OT question about the area of code touched by this patch. I created a new imap folder with no offline storage and copied about 17k messages to it from Local Folders. The copy (actually imap append) works fine but when I open the new folder, as expected, the headers for all 17k messages are fetched from the server. However, I also see interleaved in the header fetches, fetches for many, but not all, seeming random full message bodies. The url for body fetches causes OpenCacheEntry()
https://searchfox.org/comm-central/rev/90d756d83214006e9ddc9e6de13dc138bd478af3/mailnews/imap/src/nsImapProtocol.cpp#8822
to be called as though the message had been manually opened by clicking. The random and numerous full message bodies are stored to ram cache
with no other adverse effect.
So I'm just wondering if the high level .js code in the area of the patch might be affecting this. Does it request that full messages be fetched when there is no user input to request the fetch during initial folder header-only synchronization?
Note: I don't think the patch here or the original patch causes this since I also see it with the patches completely backed out.

Edit: This is caused by Junk/Spam detection. See bug 538375 comment 98

Assignee: nobody → alta88
Status: NEW → ASSIGNED
Comment on attachment 9092140 [details] [diff] [review]
fetch.patch

Review of attachment 9092140 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable, but needs some cleaning before landing
Attachment #9092140 - Flags: review?(mkmelin+mozilla) → review+
Comment on attachment 9092140 [details] [diff] [review]
fetch.patch

Thanks, I can take out the three console.log()s before landing. I'll stick this into T 70 beta 2 to get it on its way to TB 68.1.1 or 68.2.
Attachment #9092140 - Flags: review?(jorgk)
Attachment #9092140 - Flags: approval-comm-esr68+
Attachment #9092140 - Flags: approval-comm-beta+
Keywords: checkin-needed
Target Milestone: --- → Thunderbird 71.0

jorg, please also reformat the long comment around @@ -1907,26 +1906,24 @@ function CopyNewsgroupURL(newsgroupNode), it was left for readability in the wip patch.

According to my try run here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=03aa3b9ec6cc30bd67b6e481c2b429c9dc1b0fbc
we get these failures on the Z1:

TEST-UNEXPECTED-FAIL | /builds/worker/workspace/build/tests/mozmill/attachment/test-attachment-size.js | test-attachment-size.js::test_detached_attachment_with_missing_file
TEST-UNEXPECTED-FAIL | /builds/worker/workspace/build/tests/mozmill/attachment/test-attachment-size.js | test-attachment-size.js::test_multiple_attachments_one_detached_with_missing_file
TEST-UNEXPECTED-FAIL | /builds/worker/workspace/build/tests/mozmill/attachment/test-attachment-menus.js | test-attachment-menus.js::test_link_enclosure_invalid
TEST-UNEXPECTED-FAIL | /builds/worker/workspace/build/tests/mozmill/attachment/test-attachment-menus.js | test-attachment-menus.js::test_link_multiple_enclosures_all_invalid

The other failures are due to the current "bustage" situation.

I'm not sure whether I can trust those artifact try runs, so here's a real one:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=3ec81acc9d2ab321cd28a8a5775b289971b46a6e

I think the Z1 will fail just the same.

Flags: needinfo?(alta88)

Hmm, test-attachment-size.js can be fixed by changing the check to:

-  if (nodeSize != "\u200b" && nodeSize != "") {
+  if (nodeSize != "\u200b" && nodeSize != "" && nodeSize != "size unknown") {

No idea whether that's the right way to go. test-attachment-menus.js seems harder to fix, there we get:

SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\comm\mail\test\mozmill\attachment\test-attachment-menus.js | test-attachment-menus.js::test_link_enclosure_invalid
  EXCEPTION: "attachmentSaveAllSingle" should be disabled
    at: test-attachment-menus.js line 367
       assert_enabled test-attachment-menus.js:367 11
       check_toolbar_menu_states_single test-attachment-menus.js:383 19
       help_test_attachment_menus test-attachment-menus.js:516 37

SUMMARY-UNEXPECTED-FAIL | c:\mozilla-source\comm-central\comm\mail\test\mozmill\attachment\test-attachment-menus.js | test-attachment-menus.js::test_link_multiple_enclosures_all_invalid
  EXCEPTION: "attachmentSaveAllMultiple" should be disabled
    at: test-attachment-menus.js line 367
       assert_enabled test-attachment-menus.js:367 11
       check_toolbar_menu_states_multiple test-attachment-menus.js:416 19
       help_test_attachment_menus test-attachment-menus.js:518 39
Keywords: checkin-needed
Attached patch fetch.patchSplinter Review

This fixes the tests locally, debug is removed, comments are formatted. I'll leave landing etc. to the sheriff.

Attachment #9092140 - Attachment is obsolete: true
Flags: needinfo?(alta88)
Attachment #9093917 - Flags: review+

Thanks. Note that linting complains about return !Boolean(uri.username || uri.userPass);.

Attachment #9093642 - Attachment is obsolete: true

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/292f45f41c67
Don't fetch if parts_on_demand or URL with credentials. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Attachment #9092140 - Flags: approval-comm-esr68+
Attachment #9092140 - Flags: approval-comm-beta+
Attachment #9093917 - Flags: approval-comm-esr68+
Attachment #9093917 - Flags: approval-comm-beta+
No longer blocks: 1576584
Duplicate of this bug: 1576584
Duplicate of this bug: 1581553

Cleaning out confusing dependencies.

No longer blocks: 1443750
No longer depends on: 1454542, 1519387
Duplicate of this bug: 1582668
You need to log in before you can comment on or make changes to this bug.