Last Comment Bug 681647 - Get rid of import code dependence on nsIEditor
: Get rid of import code dependence on nsIEditor
Status: RESOLVED FIXED
:
Product: MailNews Core
Classification: Components
Component: Import (show other bugs)
: Trunk
: x86_64 Windows 7
: -- normal with 1 vote (vote)
: Thunderbird 11.0
Assigned To: David :Bienvenu
:
Mentors:
Depends on:
Blocks: 684466
  Show dependency treegraph
 
Reported: 2011-08-24 07:47 PDT by David :Bienvenu
Modified: 2011-12-01 16:00 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
wip (60.45 KB, patch)
2011-08-26 10:01 PDT, David :Bienvenu
no flags Details | Diff | Review
remove Outlook cid munging (64.52 KB, patch)
2011-08-26 10:39 PDT, David :Bienvenu
no flags Details | Diff | Review
de-bitrotted patch (68.10 KB, patch)
2011-10-26 17:43 PDT, David :Bienvenu
no flags Details | Diff | Review
this gets import working again, and fixes a bug in outlook import (69.08 KB, patch)
2011-10-27 18:20 PDT, David :Bienvenu
no flags Details | Diff | Review
fix for crash and incorrect parenting (68.71 KB, patch)
2011-10-28 15:53 PDT, David :Bienvenu
standard8: review+
Details | Diff | Review
fix addressing Standard8's comments (81.18 KB, patch)
2011-11-07 10:02 PST, David :Bienvenu
mozilla: review+
Details | Diff | Review
fix addressing Neil's comments (68.41 KB, patch)
2011-11-23 11:36 PST, David :Bienvenu
neil: superreview+
Details | Diff | Review

Description David :Bienvenu 2011-08-24 07:47:23 PDT
Whenever Ehsan changes the nsIEditor interface while improving the editor code, TB Tinderbox turns red. This is because the import code creates a dummy editor to pass to nsIMsgSend to create an rfc822 message, in order to do embedded attachments (MHTML). By adding a new method to nsIMsgSend specifically for this, I can get rid of the dummy editors in the eudora and outlook import code.
Comment 1 David :Bienvenu 2011-08-25 09:03:44 PDT
ugh, this is turning out to be a bit difficult because the nsMsgSend code that process multipart related (nsMsgSend::ProcessMultipartRelated) calls SetSrc on the nsIDOMHTMLImageElement, which is our fake image element, and nsEudoraHTMLImageElement::SetSrc updates the cid:s in the editor's body text, and then nsMsgSend::ProcessMultipartRelated asks the editor for the new message body. But since we've thrown away the editor for import, we can't get the tweaked body.

Outlook import doesn't do the same thing in its image elements (they don't use the editor), so either multipart related doesn't quite work with outlook import, or it works differently. I don't have Outlook installed so it's hard to test that.
Comment 2 David :Bienvenu 2011-08-25 09:13:43 PDT
Ah, Outlook import updates the cids after the compose process has created the msg file, in CopyComposedMessage. Eudora import can do the same thing.
Comment 3 David :Bienvenu 2011-08-25 11:51:09 PDT
It looks like Eudora fixes the cids in the html message body to match the cid's generated by mime_gen_content_id, whereas Outlook alters the content-id of the generated parts to match the html cids when converting from the sent msg file to the output stream. I'm not sure if one approach is better. Outlook import does a line by line conversion of the send msg file looking for content-id headers, but Eudora doesn't do a line by line parsing, and adding one would be a pain.

I think a third approach might work better - set an attribute on the import code fake dom nodes that is the cid to use, and check for that attribute in nsMsgComposeAndSend::ProcessMultipartRelated, and if it's set, don't generate a content id, but rather, use the original.  That way, we use the right, original cid up front, and don't have to tweak the final message. If I use an attribute only the import code would set, it won't affect normal message sending.
Comment 4 David :Bienvenu 2011-08-26 10:01:45 PDT
Created attachment 556058 [details] [diff] [review]
wip

This works for Eudora import, which I've tested fairly extensively. I don't yet have a copy of Outlook to test yet. Once I do, I think I can simplify the Outlook import code's handling of embedding images like I did for Eudora import.
Comment 5 David :Bienvenu 2011-08-26 10:39:39 PDT
Created attachment 556070 [details] [diff] [review]
remove Outlook cid munging

this patch depends slightly on the patch in bug 679476
Comment 6 David :Bienvenu 2011-08-26 15:07:24 PDT
Outlook import seems to be working fine, but I'll wait for the patch in bug 679476 to get reviewed and land before asking for review for this patch.
Comment 7 Mike Kaganski 2011-08-28 20:44:26 PDT
That's great modification; this is what I missed greatly when had to do those ugly things with cids.

(In reply to David :Bienvenu from comment #3)
>I'm not sure if one approach is better

The reason to keep the origial cids is that the cids may be used not only in the message bodies, but also in other MIME parts, and replacing all possible occurencies may be impossible (e.g., some other part may be encoded).

Your approach is undoubtedly the clearest, and allows to keep the original cids that was the ultimate goal of those hacks in the outlook import code. I think that this will greatly speed up the import of mailboxes with many messages with embedded attachments.

After your changes, the nsOutlookEditor class becomes just a container for a nsISupportsArray that holds embedded images, and that we later pass to CreateRFC822Message (previously nsOutlookEditor itself was passed to CreateAndSendMessage). Maybe it's good to completely abandon this class, and use that array directly?

And isn't
 NS_IMPL_ISUPPORTS0(nsOutlookEditor)
the same as
 NS_IMPL_ISUPPORTS1(nsOutlookEditor, nsISupports)
(https://developer.mozilla.org/en/Implementing_QueryInterface#The_NS_IMPL_QUERY_INTERFACE.5b012.5d_macros)?
Comment 8 David :Bienvenu 2011-08-28 21:37:00 PDT
thx for the feedback, Mike. Getting rid of nsOutlookEditor was definitely something that occurred to me and makes sense at some point, but I'm trying to keep this patch a little bit smaller.

Yeah, NS_IMPL_ISUPPORTS0 is better.
Comment 9 Mike Kaganski 2011-09-03 06:13:28 PDT
Some thoughts on the newly suggested interface for creating RFC822 messages (CreateRFC822Message).

First, I wanted to ask if there is a specific reason why the ability to create digest messages was removed from it?

This API is quite a step forward. Still, it seems to be just a partial solution. For example, it needs IDOMHTMLImageElement to embed images. It will not allow to create multipart/alternative messages. As it is made from the code that is used to send messages, it can only those things that are used in the Mozilla sending engine.

I imagine another approach, where a library would be created, that would allow to create an object of type RFC822Message, that would be a top-level object of a tree similar to those used in XML parsers. It would be able to hold a wide range of objects that represent valid message parts (headers, body, MIME objects, including objects of its own class, e.g. if they represent an rfc822 attachment). It would have methods like AddHeader, AddTextPlain, AddMultipartRelative, AddMultipartAlternative, AddAttachment etc, and it would implement the RFCs that define the internet message format (822, MIME etc), every piece of it, and nothing else. It would not try to generate identifiers, they would need to be specified by callers. It would not have knowledge if an attachment is an image or not. It would require non-textual data (like binary attachments) passed as streams. But it would handle encodings (quoted-printable, base64, encoding of headers), escaping etc. One would need to create this object, then add its data one-by-one (headers, higher-level multipart MIME objects, and in they their texts, attachments, other MIME parts). And then call a method to dump the resulting message to a stream.

I understand that this is not going to happen in this patch. But it would allow to create a solid base that would be suitable for use both in sending messages as well as in parsing received and importing foreign. It would be flexible enough to allow for excluding some tasks such as escaping/splitting lines (that makes it necessary to introduce time-consuming hacks on import to undo their adverse effects).
Comment 10 David :Bienvenu 2011-09-12 18:05:26 PDT
(In reply to Mike Kaganski from comment #9)

Thx for your comments, Mike.
> First, I wanted to ask if there is a specific reason why the ability to
> create digest messages was removed from it?
No Thunderbird code has used that ability for at least 10 years, and that method already has a large number of parameters.
> 
> This API is quite a step forward. Still, it seems to be just a partial
> solution. For example, it needs IDOMHTMLImageElement to embed images. It
> will not allow to create multipart/alternative messages. As it is made from
> the code that is used to send messages, it can only those things that are
> used in the Mozilla sending engine.
Yeah, the point was to get rid of the dependency on nsIEditor, because that interface is changing frequently...did we used to be able to create multipart/alternative messages when doing import? If so, I should fix that.

Your other approach is definitely something that sounds good long term, but is beyond the scope of this bug.
Comment 11 Mike Kaganski 2011-09-12 20:03:57 PDT
(In reply to David :Bienvenu from comment #10)
> did we used to be able to create
> multipart/alternative messages when doing import? If so, I should fix that.
No, the former code didn't allow it. Excuse me for being indistinct. I only wanted to show a field that may be further improved, not any regression.

> Your other approach is definitely something that sounds good long term, but
> is beyond the scope of this bug.
Yes, I understand this. Of course, this bug must be self-contained, and there's no point to make it a never-ending story :) Just hope that that suggestion will not vanish long term - that's why I posted it (don't know if there's something like a wishlist where it is more appropriate to post things like that).
Comment 12 David :Bienvenu 2011-10-26 17:43:42 PDT
Created attachment 569853 [details] [diff] [review]
de-bitrotted patch

this was extremely bit-rotted - it should compile now, but I haven't tested it yet.
Comment 13 David :Bienvenu 2011-10-27 18:20:39 PDT
Created attachment 570142 [details] [diff] [review]
this gets import working again, and fixes a bug in outlook import

Besides getting rid of the editor dependency, this patch also fixes a crash in Outlook import introduced by the recent de-proxying work, so it's a little bit urgent.
Comment 14 David :Bienvenu 2011-10-27 18:26:09 PDT
Comment on attachment 570142 [details] [diff] [review]
this gets import working again, and fixes a bug in outlook import

ugh, this fixes the crash, but seems to break the parenting of imported folders (part of the crash fix). I'll keep investigating.
Comment 15 David :Bienvenu 2011-10-28 15:53:57 PDT
Created attachment 570400 [details] [diff] [review]
fix for crash and incorrect parenting

This fixes the parenting issue, and the crash. The nsImportMail.cpp change contain the crash fix and we definitely want to land that soon (I think the regression is only on the trunk). I can land that separately if that helps with the reviewing.
Comment 16 Mark Banner (:standard8) 2011-11-07 03:49:25 PST
Comment on attachment 570400 [details] [diff] [review]
fix for crash and incorrect parenting

Review of attachment 570400 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/import/outlook/src/nsOutlookCompose.cpp
@@ +331,4 @@
>    // End Bug 593907
>  
> +  const wchar_t * body = bodyW.IsEmpty() ? msg.GetBody() : bodyW.get();
> +  nsRefPtr<nsOutlookEditor> pOutlookEditor = new nsOutlookEditor();

nit: we should probably check this for OOM.

::: mailnews/import/public/nsIImportService.idl
@@ +71,4 @@
>    nsIImportABDescriptor    CreateNewABDescriptor();
>    nsIImportGeneric      CreateNewGenericMail();
>    nsIImportGeneric      CreateNewGenericAddressBooks();
> +  void CreateRFC822Message(in nsIMsgIdentity aIdentity,

This file needs a uuid bump.
Comment 17 David :Bienvenu 2011-11-07 10:02:04 PST
Created attachment 572510 [details] [diff] [review]
fix addressing Standard8's comments

carrying forward review, asking for sr.
Comment 18 David :Bienvenu 2011-11-07 16:03:07 PST
I landed part of this patch to fix the outlook import crash before we merge branches, http://hg.mozilla.org/comm-central/rev/e5917383a6ae
Comment 19 neil@parkwaycc.co.uk 2011-11-09 16:31:13 PST
Comment on attachment 572510 [details] [diff] [review]
fix addressing Standard8's comments

>+    bodyStr = NS_ConvertASCIItoUTF16(m_attachment1_body);
Do we know that the body is always ASCII?

>+      nsCOMPtr<nsIDOMElement> element(do_QueryInterface(node));
>+      if (element)
>+        element->GetAttribute(NS_LITERAL_STRING(kMozCIDAttrName), contentId);
Unless you can come up with a compelling reason to use an attribute, I'd prefer you to use nsIDOMNode::GetNodeValue; real elements never have a value.

>+  rv = Init(aUserIdentity, nsnull, (nsMsgCompFields *)aFields, nsnull,
>+          PR_FALSE, PR_TRUE, mode, nsnull,
Nit: could do with an extra two spaces of indentation here.

>+  // We only get the editor interface when there's embedded content.
>+  // Otherwise pEditor remains NULL. That way we only import with the pseudo
>+  // editor when it helps.
Didn't you remove pEditor completely?

>+  nsRefPtr<nsEudoraEditor>  pEudoraEditor = new nsEudoraEditor(m_pBody, pMailImportLocation);
>+  nsCOMPtr<nsISupportsArray> embeddedObjects;
>+  nsString uniBody;
>+  if (pEudoraEditor->HasEmbeddedContent())
>+    pEudoraEditor->GetEmbeddedObjects(getter_AddRefs(embeddedObjects));
It looks as if this is the only use of the nsEudoraEditor. Could it be turned into a local variable perhaps, like this:
nsEudoraEditor eudoraEditor(m_pBody, pMailImportLocation);
if (eudoraEditor.HasEmbeddedContent())
  eudoraEditor->GetEmbeddedObjects(getter_AddRefs(embeddedObjects));
On second thoughts, file a followup bug to move HasEmbededContent and GetEmbeddedObjects to be methods on nsEudoraCompose instead.

>-  nsString uniBody;
[Why did this move?]


> NS_IMETHODIMP nsEudoraHTMLImageElement::SetSrc(const nsAString & aSrc)
> {
>-  nsEudoraEditor *    pEditor = static_cast <nsEudoraEditor *> (static_cast <nsIEditor *> (m_pEditor.get()));
>-
>-  if ( pEditor->UpdateEmbeddedImageReference(m_cidHash, m_src, aSrc) )
>-    m_cidHash = 0;
>   m_src = aSrc;
>-
>   return NS_OK;
Change this to NS_ERROR_NOT_IMPLEMENTED?

Haven't looked at Outlook yet.
Comment 20 neil@parkwaycc.co.uk 2011-11-13 16:33:11 PST
Comment on attachment 572510 [details] [diff] [review]
fix addressing Standard8's comments

>   if (GenerateHackSequence(msg.GetBody(), msg.GetBodyLen()))
>     HackBody(msg.GetBody(), msg.GetBodyLen(), bodyW);
>+  else
>+    bodyW = msg.GetBody();
So, what happened to this block:
>-  else {
>-    if (bodyW.IsEmpty())
>-      msg.GetBody(bodyA);
>-    else
>-      nsMsgI18NConvertFromUnicode(msg.GetBodyCharset(), bodyW, bodyA);
>-  }

>+  const wchar_t * body = bodyW.IsEmpty() ? msg.GetBody() : bodyW.get();
Not sure what this is doing. I can't even see where body is used!

>+  nsRefPtr<nsOutlookEditor> pOutlookEditor = new nsOutlookEditor();
>+  NS_ENSURE_TRUE(pOutlookEditor, NS_ERROR_OUT_OF_MEMORY);
> 
>   if (msg.BodyIsHtml()) {
>     for (unsigned int i = 0; i<msg.EmbeddedAttachmentsCount(); i++) {
>       nsIURI *uri;
>       const char* cid;
>       const char* name;
>       if (msg.GetEmbeddedAttachmentInfo(i, &uri, &cid, &name))
>         pOutlookEditor->AddEmbeddedImage(uri, NS_ConvertASCIItoUTF16(cid).get(), NS_ConvertASCIItoUTF16(name).get());
>     }
>   }
Well, this is even worse than the Eudora case, because we have a msg object with embedded attachments, which we then pass into the OutlookEditor object, only to retrieve the attachments again. (Sadly the msg object uses nsIArray while the compose code wants an nsISupportsArray...)
Comment 21 David :Bienvenu 2011-11-23 11:36:58 PST
Created attachment 576571 [details] [diff] [review]
fix addressing Neil's comments
Comment 22 David :Bienvenu 2011-11-23 12:15:06 PST
(In reply to neil@parkwaycc.co.uk from comment #19)
> Do we know that the body is always ASCII?
It comes from here, in the Outlook case:

  nsMsgI18NConvertFromUnicode(msg.GetBodyCharset(), bodyW, bodyA);

and here for Eudora:

  nsCString body;

  rv = nsMsgI18NConvertFromUnicode( NS_LossyConvertUTF16toASCII(charSet).get(),
                                    uniBody, body);


Maybe we could avoid the extra roundtrip by simply passing in UTF16, possibly with the charset.

> Unless you can come up with a compelling reason to use an attribute, I'd
> prefer you to use nsIDOMNode::GetNodeValue; real elements never have a value.
Done.

> On second thoughts, file a followup bug to move HasEmbededContent and
> GetEmbeddedObjects to be methods on nsEudoraCompose instead.
OK, I'll file a follow up bug.
> 
> >-  nsString uniBody;
> [Why did this move?]
> 
moved back.
> 
> > NS_IMETHODIMP nsEudoraHTMLImageElement::SetSrc(const nsAString & aSrc)
> > {
> >-  nsEudoraEditor *    pEditor = static_cast <nsEudoraEditor *> (static_cast <nsIEditor *> (m_pEditor.get()));
> >-
> >-  if ( pEditor->UpdateEmbeddedImageReference(m_cidHash, m_src, aSrc) )
> >-    m_cidHash = 0;
> >   m_src = aSrc;
> >-
> >   return NS_OK;
> Change this to NS_ERROR_NOT_IMPLEMENTED?

It is implemented - we still do m_src = aSrc;
> 
For the Outlook case, I've gotten rid of the editor completely, and removed the const wchar_t * body var. It's a little more involved for Eudora, which is why I've left that for a follow-on bug.

As to what happened to the block I removed, my recollection is that bodyW won't be empty anymore, because I assigned it earlier. And I don't check for editor since there's no editor anymore.
Comment 23 neil@parkwaycc.co.uk 2011-11-23 13:25:05 PST
Comment on attachment 576571 [details] [diff] [review]
fix addressing Neil's comments

>+  nsRefPtr<nsEudoraEditor>  pEudoraEditor = new nsEudoraEditor(m_pBody, pMailImportLocation);
[Could still consider making this nsAutoPtr<nsEudoraEditor> pEudoraEditor or even just nsEudoraEditor eudoraEditor]

>+  nsString uniBody;
>+  NS_CopyNativeToUnicode(nsDependentCString(m_pBody), uniBody);
> 
>   /*
>     l10n - I have the body of the message in the system charset,
>@@ -636,9 +643,6 @@ nsresult nsEudoraCompose::SendTheMessage
> 
>   */
> 
>-  nsString uniBody;
>-  NS_CopyNativeToUnicode( nsDependentCString(m_pBody), uniBody);
[Not sure why this code moved]

>@@ -1620,27 +1032,18 @@ NS_IMETHODIMP nsEudoraHTMLImageElement::
> }
> 
> 
>-// attribute DOMString src
> NS_IMETHODIMP nsEudoraHTMLImageElement::GetSrc(nsAString & aSrc)
> {
>   aSrc = m_src;
>-
>   return NS_OK;
> }
> 
>-
[Useless changes?]

>+class nsEudoraEditor : public nsISupports
[Probably doesn't need a base class at all]

>+    nsEudoraHTMLImageElement(const nsAString & aSrc, const nsString &aCID);
[Why not both const nsAString & ?]

>+    nsOutlookHTMLImageElement(nsIURI *uri, const wchar_t *cid, const wchar_t *name);
[Might be worth switching to const nsAString & like you did with Eudora]
Comment 24 Mike Kaganski 2011-11-23 14:16:25 PST
nsEudoraCompose.cpp, line 626:
  // IMPORT_LOG0( "Outlook compose calling CreateAndSendMessage\n");

Maybe it's worth it to remove this?
Comment 25 Mike Kaganski 2011-11-23 14:53:39 PST
Also, in nsOutlookHTMLImageElement, m_src is an URI of a temp file containing the attachment data. nsOutlookHTMLImageElement::SetSrc used to be called by nsMsgNend::ProcessMultipartRelated to set new cid (that I had to revert previously, and that is not needed now). So:
1. The code
 m_src = aSrc;
is semantically wrong here (it assigns cid to variable holding URI) (though it isn't called now, that's why it causes no problem)
2. I suppose that now nsOutlookHTMLImageElement::SetSrc should simply return NS_ERROR_NOT_IMPLEMENTED.
Comment 26 David :Bienvenu 2011-12-01 16:00:46 PST
fixed, with comments addressed http://hg.mozilla.org/comm-central/rev/0999ba0a3515

Note You need to log in before you can comment on or make changes to this bug.