Closed Bug 686087 Opened 13 years ago Closed 13 years ago

add ability to stream just the message headers of a message

Categories

(MailNews Core :: Backend, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 10.0

People

(Reporter: Bienvenu, Assigned: Bienvenu)

Details

Attachments

(1 file, 2 obsolete files)

sometimes we want just to be able to stream the headers of a message without streaming the whole message. Currently, you have to stream the whole message and then ignore everything after the headers, which can be a bit cpu-intensive if you have a 10MB message. But I can add a method the nsIMsgMessageService that just streams the headers. For imap, this involves fetching all the headers; for local messages, and messages in the offline store, this involves just streaming the headers.
Attached patch wip with unit test (obsolete) — Splinter Review
This patch basically works. I haven't tried local folders, and I haven't implemented imap online header fetching, but I don't think I have to for this to be useful to gloda and gloda clients, because I think the only fetch local messages anyway.
Status: NEW → ASSIGNED
OS: Windows 7 → All
Hardware: x86_64 → All
(In reply to David :Bienvenu from comment #1)
> This patch basically works. I haven't tried local folders, and I haven't
> implemented imap online header fetching, but I don't think I have to for
> this to be useful to gloda and gloda clients, because I think the only fetch
> local messages anyway.

Right, gloda only streams messages if they are a) marked offline or b) live in local folders.  Having said that, I could see utility for hypothetical extensions to be able to get at headers that aren't on the nsIMsgDBHdr.  Of course, those extensions should really just set the pref so they end up on the header, probably.  Also, only protz writes gloda extensions.

I'm not sure gloda can immediately benefit from this, since when we stream the message, we want to know both the body structure and the contents of the textual body parts so that we can fulltext index them...
I should have said "Conversations" because I think there are cases where conversations is streaming a message just to get at a particular header, but streaming it in such a way as to avoid getting all the attachments.
Yes this is definitely a must for conversation as there's a lot of places we stream the message just to get the headers (when replying, when hitting "show details"...). Ideally, this would appear transparently to the JS mime emitter. I'll look into it sometime this weekend.
protz, I will get a patch up that builds against the trunk in a minute - I was going to ask you for review anyway :-)
Attached patch proposed fix with unit test (obsolete) — Splinter Review
Attachment #563263 - Attachment is obsolete: true
Attachment #563587 - Flags: review?(jonathan.protzenko)
Comment on attachment 563587 [details] [diff] [review]
proposed fix with unit test

requesting sr from asuth for the interface change
Attachment #563587 - Flags: superreview?(bugmail)
Comment on attachment 563587 [details] [diff] [review]
proposed fix with unit test

I decree that the addition of this new method is useful and will not impact JS consumers and C++ consumers will only need a rebuild with the new signature and everything is shiny.
Attachment #563587 - Flags: superreview?(bugmail) → superreview+
Comment on attachment 563587 [details] [diff] [review]
proposed fix with unit test

Review of attachment 563587 [details] [diff] [review]:
-----------------------------------------------------------------

I'm deeply sorry the review took so long. I'm really busy right now and I'm having trouble finding time for things like these, when all of the time I have for Mozilla is already devoted to the account provisioner. Hopefully things should get back to a more stable state once I'm done with it.

This looks pretty right, and I think you deserve a r+ assuming you answer to the questions I have :). Thanks! I'll take care of providing a nice JS wrapper on top of this, probably in a followup bug.

::: mailnews/base/public/nsIMsgMessageService.idl
@@ +237,5 @@
> +   *
> +   * @return the URL that gets run, if any.
> +   */
> +  nsIURI streamHeaders(in string aMessageURI, in nsIStreamListener aConsumer,
> +                       in nsIUrlListener aUrlListener, 

I don't know (yet) how we're going to expose that through a nice API on the JS side of things. I'm thinking about something added to mimemsg.js (from mailnews/db/gloda/modules). I guess it would be unnecessary to make things run through libmime, so I tend to think that aAdditionalHeader is not needed here. I'm thinking about just exploding the string obtained on the "\n" boundaries, splitting each line on ":" as [key, value] and running value through GlodaUtils.deMime.

::: mailnews/base/util/nsMsgUtils.cpp
@@ +2182,5 @@
>  }
> +
> +NS_MSG_BASE nsresult
> +MsgStreamMsgHeaders(nsIInputStream *aInputStream, nsIStreamListener *aConsumer,
> +                    nsIUrlListener *aUrlListener, nsIMsgWindow *aMsgWindow)

the aMsgWindow parameter seems to be unused; the aUrlListener parameter too seems to be unused

@@ +2191,5 @@
> +
> +  nsCAutoString msgHeaders;
> +  nsCAutoString curLine;
> +
> +  bool more = PR_TRUE;

It felt almost weird not having the PRBool here. :)

@@ +2198,5 @@
> +  while (more)
> +  {
> +    rv = NS_ReadLine(aInputStream, lineBuffer, curLine, &more);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    if (curLine.IsEmpty())

Just out of curiosity: who's taking care here of \r\n vs \n?

@@ +2212,5 @@
> +  nsCOMPtr<nsIInputStreamPump> pump;
> +  rv = NS_NewInputStreamPump(getter_AddRefs(pump), hdrsStream);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  // do we need a context for the asyncread?

I don't think I'm the right person to answer this question. The documentation is there, but is very good at NOT telling you what the context is supposed to be.

http://mxr.mozilla.org/comm-central/source/mozilla/netwerk/base/public/nsIInputStreamPump.idl#104

::: mailnews/imap/src/nsImapService.cpp
@@ +1283,5 @@
> +    nsCOMPtr<nsIURI> url(do_QueryInterface(imapUrl));
> +    nsCOMPtr<nsICacheEntryDescriptor> cacheEntry;
> +    bool msgInMemCache = false;
> +    rv = IsMsgInMemCache(url, folder, getter_AddRefs(cacheEntry), &msgInMemCache);
> +    NS_ENSURE_SUCCESS(rv, rv);

Why should the message always be in the mem cache? I'm probably not understanding the dynamics fully here, but this sounds like a sketchy assumption if things are as I think they are. We could get a reference to a nsIMsgDBHdr through a Gloda query without ever streaming the message in the first place.
Attachment #563587 - Flags: review?(jonathan.protzenko) → review+
(In reply to Jonathan Protzenko [:protz] from comment #9)
> I'm deeply sorry the review took so long. I'm really busy right now and I'm
> having trouble finding time for things like these,
No problem, I did it for you, and until you can actually take advantage of it in Conversations, there's not a big hurry...
> ::: mailnews/base/public/nsIMsgMessageService.idl
> @@ +237,5 @@
> > +   *
> > +   * @return the URL that gets run, if any.
> > +   */
> > +  nsIURI streamHeaders(in string aMessageURI, in nsIStreamListener aConsumer,
> > +                       in nsIUrlListener aUrlListener, 
> 
> o I tend to think that aAdditionalHeader is not needed
> here.
Right, I'm not taking an aAdditionalHeader param.

> the aMsgWindow parameter seems to be unused; the aUrlListener parameter too
> seems to be unused

Oh, yeah, they should go away. They might be an issue if/when I implement this for non locally cached messages (i.e., we get the headers from the imap server), but I don't think that will affect this function. 
> Just out of curiosity: who's taking care here of \r\n vs \n?

it will always be /r/n by the time it gets to you, but NS_ReadLine handles it internally. I could make it be MSG_LINEBREAK, if you'd prefer, but since the output of this method shouldn't get filed directly to disk, native line endings aren't that important.
> 
> @@ +2212,5 @@
> > +  nsCOMPtr<nsIInputStreamPump> pump;
> > +  rv = NS_NewInputStreamPump(getter_AddRefs(pump), hdrsStream);
> > +  NS_ENSURE_SUCCESS(rv, rv);
> > +
> > +  // do we need a context for the asyncread?
It doesn't appear that we do, so I've just removed the comment.
> 
> 
> ::: mailnews/imap/src/nsImapService.cpp
> @@ +1283,5 @@
> > +    nsCOMPtr<nsIURI> url(do_QueryInterface(imapUrl));
> > +    nsCOMPtr<nsICacheEntryDescriptor> cacheEntry;
> > +    bool msgInMemCache = false;
> > +    rv = IsMsgInMemCache(url, folder, getter_AddRefs(cacheEntry), &msgInMemCache);
> > +    NS_ENSURE_SUCCESS(rv, rv);
> 
> Why should the message always be in the mem cache? I'm probably not
> understanding the dynamics fully here, but this sounds like a sketchy
> assumption if things are as I think they are. We could get a reference to a
> nsIMsgDBHdr through a Gloda query without ever streaming the message in the
> first place.
This isn't assuming the message is in the mem/disk cache. It's just checking that the code to find that out isn't failing. But we are only handling the case that aLocalOnly is true, because I think that should be sufficient for now. In other words, if aLocalOnly is false, and we don't have the message in the offline store or the mem/disk cache, this function will return failure. Earlier, we checked if the message was in the offline store.
this is what I plan on landing in the next day or two.
Attachment #563587 - Attachment is obsolete: true
Attachment #564720 - Flags: superreview+
Attachment #564720 - Flags: review+
fixed on trunk - http://hg.mozilla.org/comm-central/rev/b83fba0bc529
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 10.0
David, I just looked into this and it's not much use for me in its current state: I have no way of running the headers through libmime so that they're decoded, with the line-wrapping taken into account, etc. Any hints? :-)
Protz, which headers do you want to decode? The basic logic would be to parse each line until you found the line starting with the header you want (e.g., To:) and then accumulate lines until you find an other header (i.e., a line that doesn't start with whitespace). Then, you can use the mime converter service, "@mozilla.org/messenger/mimeconverter;1", nsIMimeConverter::decodeMimeHeader. For the charset, you should be able to use msgHdr.Charset.

As far as the whitespace folding is concerned, http://mxr.mozilla.org/comm-central/ident?i=MimeHeaders_get shows you libmime does. 

Or, maybe you could simply use nsIMimeHeaders, passing in all the headers you get streamed, and ask it to extract out the header you want. I think that'll do the whitespace foldering/continuation stuff for you (but not the decoding).
You need to log in before you can comment on or make changes to this bug.