Closed Bug 842632 Opened 11 years ago Closed 10 years ago

Improve the nsIMsgHeaderParser interface

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 29.0

People

(Reporter: jcranmer, Assigned: jcranmer)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: addon-compat, arch, Whiteboard: [external-api-bustage])

Attachments

(12 files, 22 obsolete files)

9.95 KB, patch
Irving
: review+
Details | Diff | Splinter Review
18.38 KB, patch
rkent
: review+
Details | Diff | Splinter Review
12.52 KB, patch
Details | Diff | Splinter Review
52.48 KB, patch
iannbugzilla
: review+
Details | Diff | Splinter Review
39.95 KB, patch
Irving
: review+
Details | Diff | Splinter Review
13.68 KB, patch
rkent
: review+
Details | Diff | Splinter Review
3.46 KB, patch
standard8
: review+
Details | Diff | Splinter Review
16.67 KB, patch
neil
: review+
Details | Diff | Splinter Review
22.88 KB, patch
neil
: review+
Details | Diff | Splinter Review
26.49 KB, patch
neil
: review+
Details | Diff | Splinter Review
36.26 KB, patch
jcranmer
: review+
Details | Diff | Splinter Review
1.06 KB, patch
jcranmer
: review+
Details | Diff | Splinter Review
Okay, when I was first sketching out a work plan for jsmime, I assumed that transitioning most of the auxiliary interfaces to use jsmime would be simple. This is not, since it is a fairly mashed-up interface that tries to do too many things and is too easy to mess up. The real problem is that the "correct" interface for this is difficult to express in XPIDL easily, and the compose window tries to use a pretty display but doesn't fully commit itself to it.

The final interface I want looks like this:
/// Go from a MIME header that has yet to be RFC 2047-decoded to the list of emails
jsval parseUndecodedHeader(in ACString undecodedHeader, in string headerCharset, [optional] in bool preserveGroups)
/// Go from a 2047-decoded MIME header to the list of emails
jsval parseDecodedHeader(in AString decodedHeader, [optional] in bool preserveGroups)
/// Go from a list of emails to a MIME header (caveat emptor)
AUTF8String makeHeader(in jsval addresses)
/// This extracts a display name for the first address in the decodedHeader
AString extractName(in AString decodedHeader)
/// This strips out all addresses that are duplicates of earlier ones in decodedHeader or present in otherHeaders
AString removeDuplicateAddresses(in AString decodedHeader, [optional] in jsval otherHeaders)
/// This is effectively the constructor for an element of the list
jsval makeEmailObject(in AUTF8String name, in AUTF8String email)
/// Ditto
jsval makeGroupObject(in AUTF8String name, in jsval members)

Most of the jsvals are arrays of objects that have name/email or name/members (for groups) structs, with a toString() method that is effectively:
function toString() { return this.name ? this.name + " <" + this.email + ">" : this.email; }.

Obviously, there isn't a lot of overlap with the current header interface, so this will break nearly every extension that uses the interface. I wrote this little guide for how to transition to the new interface:
[ Note: ALL calls here should use msgHdr.mime2DecodedAuthor/etc. instead of msgHdr.author ]
parser.parseHeadersWithArray(line, {}, {}, {}, {}) -> parseDecodedHeader(line); (with lots of different accessors)
parser.extractHeaderAddressMailboxes(line) -> [ x.email for (x of parseDecodedHeader(line))].join(", ");
parser.extractHeaderAddressNames(line) -> [x.name || x.email for (x of parseDecodedHeader(line))].join(", ");
parser.extractHeaderAddressName(line) -> parser.extractName(line)
parser.makeFullAddress(name, email) -> parser.makeHeader({name: name, email: email}) OR parser.makeEmailObject(name, email).toString()
parser.unquotePhraseOrAddr -> should be unnecessary (parse*Header, extractName unquotes all strings)
parser.reformatUnquotedAddresses(line) -> should also be unnecessary, but this should get you there:
  parser.makeHeader(parser.parseDecodedHeader(line, true))

Also obviously, this isn't really usable by C++ as-is, so I've taken the time to create a new API for C++ that uses nsTArray where appropriate, and have managed to replace every single use in C++ code of nsIMsgHeaderParser by this new API.

The other problem is that I need to alter the fundamental viewpoint of the addressing widgets in compose code so that they can internally keep track of values in the "proper" format. For this reason, the set of patches that I will eventually produce will both be extremely tightly coupled (I won't risk landing them except as a group) and probably the most dangerous set of patches I've written to date.
(In reply to Joshua Cranmer from comment #0)
> The real problem is that the
> "correct" interface for this is difficult to express in XPIDL easily
You mean without using [array, size_is(count)] everywhere?

interface nsIMsgEmailAddress : nsISupports {
  attribute AUTF8String name;
  attribute AUTF8String email;
  AUTF8String toString();
}

> jsval parseUndecodedHeader(in ACString undecodedHeader, in string
> headerCharset, [optional] in bool preserveGroups)
void decodeAndParseHeader(in ACString header, in string headerCharset, [optional] in bool preserveGroups, [optional] out unsigned long count, [retval, array, size_is(count)] out nsIMsgEmailAddress addresses)

> AUTF8String makeHeader(in jsval addresses)
AUTF8String makeHeader([array, size_is(count)] in nsIMsgEmailAddress addresses, in unsigned long count)

Note that XPConnect will of course allow you to write stuff such as
var header = MIME.makeHeader([{name: name, email: email}]);
(In reply to neil@parkwaycc.co.uk from comment #1)
> (In reply to Joshua Cranmer from comment #0)
> > The real problem is that the
> > "correct" interface for this is difficult to express in XPIDL easily
> You mean without using [array, size_is(count)] everywhere?
> 
> interface nsIMsgEmailAddress : nsISupports {
>   attribute AUTF8String name;
>   attribute AUTF8String email;
>   AUTF8String toString();
> }

It's more complicated than that. What it actually looks like is an array of objects which are either {name, email} or {name, members}. Doing this in xpidl requires setting up three interfaces, specifying inheritance (which implies that I need to support QI on all the methods then), and then using all the array goop.

Since I have a C++ wrapping API (a set of global functions in mozilla/mailnews/MimeHeaderParser.h), I don't need to make the API very comfortable to use from C++, and I think this is a case where exuberant use of XPIDL and XPCOM makes everything reek.
Keywords: addon-compat
I'm not suicidal enough to request review or even feedback on this patch. This is merely an overview of the cumulative changes of the several dozen patches in my queue. I am posting this merely to give the eventual reviews on idea of what things should look like in the end for several of the APIs.

That said, I am increasingly of the opinion that the entire way we treat message headers in the codebase is just plain wrong, since while mime2Decoded* does the RFC 2047 conversion most correctly, it applies it at the wrong time. Fixing that for our entire codebase and addons is... an even more adventuresome endeavor.

At some point I have to tell my internal rewriting engine to say "enough is enough" and move on to more useful grounds in jsmime, so I think this API rewrite is enough for now. The logical extensions--adding generic structured header parsing APIs and converting nsIMsgCompFields and nsIMsgDBHdr to act like them--I think I will leave for interested third parties or for myself a few years down the line.
What a monster work. Thanks jcranmer! :)
Keywords: arch
Comment on attachment 732176 [details] [diff] [review]
Overview of where things should end up

Review of attachment 732176 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/public/nsIMsgHeaderParser.idl
@@ +45,5 @@
> +   * @param preserveGroups  If false (the default), the result is a flat array
> +   *                        of email objects, containing new group objects.
> +   * @return                An array corresponding to the header description.
> +   */
> +  jsval parseUndecodedHeader(in ACString undecodedHeader,

Just a fly-by couple of comments about the interface:

- I think this would be better as "parseEncodedHeader" (and/or include 2047). "undecoded" just doesn't flow nicely.

- I think we're generally using a<param name> rather than just <param name>
Comment on attachment 732176 [details] [diff] [review]
Overview of where things should end up

Review of attachment 732176 [details] [diff] [review]:
-----------------------------------------------------------------

Epic piece of work. Detailed review of this is going to be pretty intense, but I like where it's going.

I mostly agree with Neil that despite the ugliness of XPCOM, I'd prefer to use defined interfaces rather than jsval. Could you hack around the email vs. mailing list union by just having both fields in the IDL structure and switching on whether the email field has a value?

::: calendar/base/modules/calItipUtils.jsm
@@ +363,5 @@
>  
>          // First check the recipient list
> +        emails = hdrParser.parseUndecodedHeader(aMsgHdr.recipients, aMsgHdr.Charset);
> +        for (let recipient of emails) {
> +            if (recipient.email.toLowerCase() in emailMap) {

Is this going to trip over mail groups (no recipient.email property?)

::: mailnews/base/util/IOUtils.js
@@ +32,3 @@
>      while (str.length > 0) {
>        data += str;
> +      str = sstream.readBytes(sstream.available());

I'm a bit nervous about this - will this loop trying to read 0 bytes if data is coming slowly on the stream?
Blocks: 858337
This is the first and most massive patch in the entire set of patches. It is also the patch that scares me far more than all the other patches in my queue. Thus I'm going to post it slightly before the others so I can give reviewers more time to think about this.

What is going on here is I am distinguishing between people who want to make a MIME header for compose or to send out the header and those who are trying to display a message. The tricky part is that while the compose addressing widgets ideally fall under the latter category, they also get fed in data via MIME headers at times, so we were always conservative about when we could unquote.

The reason I'm asking several people for review is this:
bwinton - making sure that I'm not breaking any ways of entering addresses into the widget, especially when I start playing with "dangerous" display names in the address book.
mconley - reviewing that my changes to the Thunderbird UI are not broken
IanN - the above, but for SeaMonkey instead of TB (it should be identical in logic to TB, but I'll admit I didn't try running)
Standard8 - reviewing just about everything else
Neil - Superreview for more eyes on this patch
Attachment #733655 - Flags: ui-review?(bwinton)
Attachment #733655 - Flags: superreview?(neil)
Attachment #733655 - Flags: review?(mconley)
Attachment #733655 - Flags: review?(mbanner)
Attachment #733655 - Flags: review?(iann_bugzilla)
Comment on attachment 733655 [details] [diff] [review]
Part 1: Cleanly distinguish when unquoting needs to happen

>-              recipient = MailServices.headerParser.reformatUnquotedAddresses(fieldValue);
Why remove just the consumers? (Alternatively, since you're apparently having to wait to replace the method, why not update the callers later?)

>-      recipient = null;
>-      try {
>-        recipient =
>-          MailServices.headerParser.unquotePhraseOrAddrWString(inputArray[index], true);
>-      } catch (ex) {};
>-      if (!recipient)
>         recipient = inputArray[index];
>       _awSetInputAndPopup(recipient, popupValue, parentNode, templateNode);
So the user is going to see a quoted address here?

>+// but we need to wait until it is
>+// JS-implemented first
Why?

>+        let name = '', address == '';
>+        let lbracket = fieldValue.lastIndexOf("<");
>+        if (lbracket > 0) {
>+          name = fieldValue.slice(0, lbracket).trim();
>+          address = fieldValue.slice(lbracket + 1,
>+            fieldValue.indexOf(">", lbrackt)).trim();
>+        } else {
>+          address = fieldValue.trim();
>+        }
>+        cardproperty.primaryEmail = addresses.value[j];
>+        cardproperty.displayName = names.value[j];
Not sure this is right.

>+          decodedName = NS_ConvertUTF8toUTF16(pNames);
...
>+        recipient = NS_ConvertUTF8toUTF16(pAddresses);
Nit: CopyUTF8toUTF16 (you got it right later)

>+        decodedName = nsDependentCString(pNames);
The nsDependentCString doesn't prevent the copy, so might as well assign, unless you want to use ?: in the next line.

>-      result: [ '"A " <a@xxx.invalid>', "b@xxx.invalid" ]
>+      result: [ 'A  <a@xxx.invalid>', "b@xxx.invalid" ]
Huh?

>+void MakeMimeAddress(const nsAString &aName, const nsAString &aEmail,
>+                     nsAString &full)
>+{
>+  nsCOMPtr<nsIMsgHeaderParser> headerParser =
>+    do_GetService(NS_MAILNEWS_MIME_HEADER_PARSER_CONTRACTID);
>+
>+  headerParser->MakeMimeAddress(aName, aEmail, full);
>+}
Will this become the primary definition rather than using the service?

>-  rv = aParser->MakeFullAddressString(aName, aAddress,
>-                                      getter_Copies(fullAddress));
>-  if (NS_SUCCEEDED(rv) && !fullAddress.IsEmpty())
>+  fullAddress.Adopt(msg_make_full_address(aName, aAddress));
This is the only caller of MakeFullAddressString?

>+  UnquotePhraseOrAddrWString(quoted.get(), false, getter_Copies(fullAddress));
getter_Copies doesn't work on external API nsAString, would you mind using msg_unquote_phrase_or_addr directly?

>+NS_IMETHODIMP nsMsgHeaderParser::MakeMimeAddress(const nsAString &aName,
>+    const nsAString &aEmail, nsAString &fullAddress)
>+{
>+  return MakeFullAddress(aName, aEmail, fullAddress);
>+}
Why not just rename MakeFullAddress to MakeMimeAddress?
Comment on attachment 733655 [details] [diff] [review]
Part 1: Cleanly distinguish when unquoting needs to happen

Review of attachment 733655 [details] [diff] [review]:
-----------------------------------------------------------------

Haven't tested this patch yet, but just did some perusal - looks pretty reasonable. Just two nits below.

::: mail/components/compose/content/addressingWidgetOverlay.js
@@ +268,1 @@
>          recipient = inputArray[index];

nit: busted indentation on line 268 since the conditional was removed.

::: mailnews/addrbook/content/abMailListDialog.js
@@ +111,5 @@
>      {
>        cardproperty = cardproperty.QueryInterface(Components.interfaces.nsIAbCard);
>        if (cardproperty)
>        {
> +        let name = '', address == '';

Er, is the plan to eventually use that nice EmailAddress class (or something similar) you used in addressingWidgetOverlay.js here as well? Because I'd support that plan.
(In reply to Mike Conley (:mconley) from comment #9)
> ::: mailnews/addrbook/content/abMailListDialog.js
> @@ +111,5 @@
> >      {
> >        cardproperty = cardproperty.QueryInterface(Components.interfaces.nsIAbCard);
> >        if (cardproperty)
> >        {
> > +        let name = '', address == '';
> 

Is the "==" intentional?
If I ever had any nice things to say about our compose code, I now only have a never-ending string of coarse invectives.
Attachment #735722 - Flags: review?(irving)
Comment on attachment 735721 [details] [diff] [review]
Part 2: Implement a C++ API for nsIMsgHeaderParser

Note to self: :NeilAway does not match Neil when requesting review.

However much I want it to. :-P
Attachment #735721 - Flags: review? → review?(neil)
Attachment #735724 - Flags: review?(kent)
Attachment #735726 - Flags: review?(mbanner)
Attachment #735728 - Flags: review?(irving)
Attached patch Part 3e: Use the C++ API in base (obsolete) — Splinter Review
And fix the broken test.
Attachment #735729 - Flags: review?(neil)
Which involves killing almost-unusd methods on nsIMsgDBHdr and thence a brief change in local.
Attachment #735730 - Flags: review?(kent)
Attachment #735731 - Flags: review?(neil)
Attachment #735733 - Flags: review?(neil)
So I'm skipping 6 patches in my queue to put this patch here. But this shouldn't actually cause any problems: I don't touch these two files in the interim, and I don't use the API until subsequent patches anyways.
Attachment #735735 - Flags: superreview?(neil)
Attachment #735735 - Flags: review?(irving)
Blocks: 860103
Comment on attachment 733655 [details] [diff] [review]
Part 1: Cleanly distinguish when unquoting needs to happen

Review of attachment 733655 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/addrbook/content/abMailListDialog.js
@@ +121,5 @@
> +        } else {
> +          address = fieldValue.trim();
> +        }
> +        cardproperty.primaryEmail = addresses.value[j];
> +        cardproperty.displayName = names.value[j];

You want to use address and name here?
(In reply to neil@parkwaycc.co.uk from comment #8)
> Comment on attachment 733655 [details] [diff] [review]
> Part 1: Cleanly distinguish when unquoting needs to happen
> 
> >-              recipient = MailServices.headerParser.reformatUnquotedAddresses(fieldValue);
> Why remove just the consumers? (Alternatively, since you're apparently
> having to wait to replace the method, why not update the callers later?)

My original intent for this patch was to limit it to just fixing the quoted/unquoted address issues.

> >-      recipient = null;
> >-      try {
> >-        recipient =
> >-          MailServices.headerParser.unquotePhraseOrAddrWString(inputArray[index], true);
> >-      } catch (ex) {};
> >-      if (!recipient)
> >         recipient = inputArray[index];
> >       _awSetInputAndPopup(recipient, popupValue, parentNode, templateNode);
> So the user is going to see a quoted address here?

No; the intent is that everything that would eventually feed into this method has the unquoting applied at its source.

> >+// but we need to wait until it is
> >+// JS-implemented first
> Why?

At the time I wrote this patch, the API would have been way too hard to write with the current nsMsgHeaderParser implementation.

> >+        let name = '', address == '';
> >+        let lbracket = fieldValue.lastIndexOf("<");
> >+        if (lbracket > 0) {
> >+          name = fieldValue.slice(0, lbracket).trim();
> >+          address = fieldValue.slice(lbracket + 1,
> >+            fieldValue.indexOf(">", lbrackt)).trim();
> >+        } else {
> >+          address = fieldValue.trim();
> >+        }
> >+        cardproperty.primaryEmail = addresses.value[j];
> >+        cardproperty.displayName = names.value[j];
> Not sure this is right.

Good point, I'll fix it.

> >-      result: [ '"A " <a@xxx.invalid>', "b@xxx.invalid" ]
> >+      result: [ 'A  <a@xxx.invalid>', "b@xxx.invalid" ]
> Huh?

Dropping the quotes here, since splitRecipients now returns unquoted (display values) instead of MIME-ready values.

> >+void MakeMimeAddress(const nsAString &aName, const nsAString &aEmail,
> >+                     nsAString &full)
> >+{
> >+  nsCOMPtr<nsIMsgHeaderParser> headerParser =
> >+    do_GetService(NS_MAILNEWS_MIME_HEADER_PARSER_CONTRACTID);
> >+
> >+  headerParser->MakeMimeAddress(aName, aEmail, full);
> >+}
> Will this become the primary definition rather than using the service?

MakeMimeAddress is the pre-JS variant of what will eventually be MakeMimeHeader.

> >-  rv = aParser->MakeFullAddressString(aName, aAddress,
> >-                                      getter_Copies(fullAddress));
> >-  if (NS_SUCCEEDED(rv) && !fullAddress.IsEmpty())
> >+  fullAddress.Adopt(msg_make_full_address(aName, aAddress));
> This is the only caller of MakeFullAddressString?

... I think so?

> >+  UnquotePhraseOrAddrWString(quoted.get(), false, getter_Copies(fullAddress));
> getter_Copies doesn't work on external API nsAString, would you mind using
> msg_unquote_phrase_or_addr directly?

That blows. For what it's worth, I'm not planning on landing this patch independently of at least the new-API patch. Also, msg_unquote_phrase_or_addr works on char*, not PRUnichar*.

> >+NS_IMETHODIMP nsMsgHeaderParser::MakeMimeAddress(const nsAString &aName,
> >+    const nsAString &aEmail, nsAString &fullAddress)
> >+{
> >+  return MakeFullAddress(aName, aEmail, fullAddress);
> >+}
> Why not just rename MakeFullAddress to MakeMimeAddress?

That would be preferable.

(In reply to Mike Conley (:mconley) from comment #9)
> ::: mail/components/compose/content/addressingWidgetOverlay.js
> @@ +268,1 @@
> >          recipient = inputArray[index];
> 
> nit: busted indentation on line 268 since the conditional was removed.

My merge program is set up to ignore whitespace when resolving conflicts, so I missed this one.

> ::: mailnews/addrbook/content/abMailListDialog.js
> @@ +111,5 @@
> >      {
> >        cardproperty = cardproperty.QueryInterface(Components.interfaces.nsIAbCard);
> >        if (cardproperty)
> >        {
> > +        let name = '', address == '';
> 
> Er, is the plan to eventually use that nice EmailAddress class (or something
> similar) you used in addressingWidgetOverlay.js here as well? Because I'd
> support that plan.

The logic that I personally think ought to be used would be to store the data in a structured format (which the EmailAddress is a temporary stop-gap solution for until I can put this stuff on nsIMsgHeaderParser). I spent more time on the addressingWidgetOverlay.js side because that's the UI that's more important, and I was kind of hoping that ensemble would kill the mail list UI anyways. :-) There are *lots* of issues I have with mailing lists now...
(In reply to Joshua Cranmer from comment #23)
> (In reply to comment #8)
> > (From update of attachment 733655 [details] [diff] [review])
> > >-        recipient =
> > >-          MailServices.headerParser.unquotePhraseOrAddrWString(inputArray[index], true);
> > So the user is going to see a quoted address here?
> No; the intent is that everything that would eventually feed into this
> method has the unquoting applied at its source.
I'm confused. This patch fixes quoting issues, except where it doesn't?

> > >-      result: [ '"A " <a@xxx.invalid>', "b@xxx.invalid" ]
> > >+      result: [ 'A  <a@xxx.invalid>', "b@xxx.invalid" ]
> > Huh?
> Dropping the quotes here, since splitRecipients now returns unquoted
> (display values) instead of MIME-ready values.
Is it supposed to be possible to round-trip display values, and if so, how will you avoid losing the trailing space?
Comment on attachment 733655 [details] [diff] [review]
Part 1: Cleanly distinguish when unquoting needs to happen

>+++ b/mailnews/addrbook/content/abMailListDialog.js
>@@ -107,33 +107,30 @@ function GetListValue(mailList, doAdd)
>         // of elements in the addressLists attribute
>       }
>     }
>     else if (cardproperty)
>     {
>       cardproperty = cardproperty.QueryInterface(Components.interfaces.nsIAbCard);
>       if (cardproperty)
>       {
>-        var addresses = {};
>-        var names = {};
>-        var fullNames = {};
>-        var numAddresses = MailServices.headerParser.parseHeadersWithArray(fieldValue, addresses, names, fullNames);
>-        for (var j = 0; j < numAddresses; j++)
>-        {
>-          if (j > 0)
>-          {
>-            cardproperty = Components.classes["@mozilla.org/addressbook/cardproperty;1"].createInstance();
>-            cardproperty = cardproperty.QueryInterface(Components.interfaces.nsIAbCard);
>-          }
>-          cardproperty.primaryEmail = addresses.value[j];
>-          cardproperty.displayName = names.value[j];
>+        let name = '', address == '';
>+        let lbracket = fieldValue.lastIndexOf("<");
>+        if (lbracket > 0) {
>+          name = fieldValue.slice(0, lbracket).trim();
>+          address = fieldValue.slice(lbracket + 1,
>+            fieldValue.indexOf(">", lbrackt)).trim();
Typo: need lbracket rather than lbrackt
>+        } else {
>+          address = fieldValue.trim();
>+        }
>+        cardproperty.primaryEmail = addresses.value[j];
>+        cardproperty.displayName = names.value[j];
> 
>-          if (doAdd || (doAdd == false && pos >= oldTotal))
>-            mailList.addressLists.appendElement(cardproperty, false);
>-        }
>+        if (doAdd || (doAdd == false && pos >= oldTotal))
>+          mailList.addressLists.appendElement(cardproperty, false);
>         pos++;
>       }
>     }
>     i++;
>   }
The old code used to deal properly with lines containing comma separated entries, the new code does not. If you drag two existing entries onto the same line separated by a comma, it will mangle the display name in the second entries card in the address book.

>+++ b/suite/mailnews/compose/addressingWidgetOverlay.js
>+// TODO: This should go on the header parser, but we need to wait until it is
>+// JS-implemented first
>+function EmailAddress(fullAddress) {
>+  if (fullAddress.contains('<')) {
>+    let addrstart = fullAddress.lastIndexOf('<');
>+    let addrend = fullAddress.lastIndexOf('>');
>+    this.name = fullAddress.slice(0, addrstart).trim();
>+    this.email = fullAddress.slice(addrstart + 1, addrend).trim();
>+  } else {
>+    this.name = '';
>+    this.email = fullAddress;
Does this need a .trim()?
>+  }
>+}

I won't mention what others have already said, so only one other small nit - these bits of code for converting fieldValue/fullAddress into name and email do the same thing but are coded differently - all should probably use the if (foo.contains('<')) style.
r- for the moment as I would like to see and test the new version of the patch.
Attachment #733655 - Flags: review?(iann_bugzilla) → review-
Comment on attachment 733655 [details] [diff] [review]
Part 1: Cleanly distinguish when unquoting needs to happen

Review of attachment 733655 [details] [diff] [review]:
-----------------------------------------------------------------

I've no general objections to this, so I think I'll take a look once other's review comments have been addressed.
Attachment #733655 - Flags: review?(mbanner)
Comment on attachment 735726 [details] [diff] [review]
Part 3c: Use the C++ API in addrbook

Review of attachment 735726 [details] [diff] [review]:
-----------------------------------------------------------------

General idea looks fine, but I'll do a full review/test once part 1 is ready/has landed.

::: mailnews/addrbook/test/unit/test_collection.js
@@ +223,5 @@
>    },
>  
>    checkCardResult : function (aDetails, overrideMailFormat) {
>      try {
> +      dump(uneval(aDetails) + '\n');

This looks like a debug statement. Do we really need it? (if so it needs explanation in the debug).
Attachment #735726 - Flags: review?(mbanner) → feedback+
(In reply to Joshua Cranmer [:jcranmer] from comment #23)
> No; the intent is that everything that would eventually feed into this
> method has the unquoting applied at its source.
...
> Dropping the quotes here, since splitRecipients now returns unquoted
> (display values) instead of MIME-ready values.

The SourceThing wrapper that I'm using in the sepsis console might be relevant.

The problem is making sure caveats like this stay in mind for whomever is touching the code at any given time.  That person may not even be the original author, and so with a (probably) shallower understanding, may not even know that those caveats exist, let alone that he or she needs to be paying attention to them when making changes.

By coincidence, the day after you left this comment, I was looking at some code I'd written, where it's passing strings around, and sometimes it's escaped and sometimes it isn't.  I was worried that in the future someone (possibly me) might come along and touch the code, and get it wrong (pass or operate on some escaped or unescaped strings when it's believed to be the other way around).

Stuff like this--what I'm dealing with and yours, too--is particularly susceptible to oversight -> error -> breakage with code changes.  In particular, I was worried that it would manifest itself as a security issue.  (This is what happened with the devtools team in bug 771859, for example.)  So I wrote a wrapper to mitigate this, and make sure to use only the wrapper's methods to access it.

Have a look:
http://hg.mozilla.org/users/sevenspade_gmail.com/sepsis/file/48207ed5a270/resources/content/utils/ids.js#l87
Search for occurrences of "SourceThing" in consoleAutoComplete.js:
http://hg.mozilla.org/users/sevenspade_gmail.com/sepsis/file/48207ed5a270/resources/content/utils/consoleAutoComplete.js

(I actually made further changes to the wrapper this morning to require at instantiation a boolean flag indicating what the internal format is, rather than some ~unreliable inferences I had it make originally.)
(In reply to Ian Neal from comment #25)
> The old code used to deal properly with lines containing comma separated
> entries, the new code does not. If you drag two existing entries onto the
> same line separated by a comma, it will mangle the display name in the
> second entries card in the address book.

It actually turns out, since the dialog is modal, that you can't drag to it on Windows, but you can on Linux. And the more I dig into this code, the more I have to restrain myself from rewriting the entire UI of Thunderbird. Lots of swearing is in order here, but I have a fix for this in a new patch I'll be uploading shortly.

(In reply to neil@parkwaycc.co.uk from comment #24)
> I'm confused. This patch fixes quoting issues, except where it doesn't?

There's one or two places where quoting still happens, since parseHeadersWithArray isn't set to unquote in maximal full aggressive unquoting; however, all of those end up being changed somewhere down the line of all the patches. There was also a bug in the AB mailing list dialog (seriously, I hate that code with a passion).

> Is it supposed to be possible to round-trip display values, and if so, how
> will you avoid losing the trailing space?

In the new patch I'll upload shortly, the display value can be round-tripped. However, by the time I do final MIME header generation, I end up stripping the spaces off the end anyways.

After thinking about some of the issues brought up in the review, I've modified the API slightly. In particular, I've added an inverse to a mailbox object .toString(), and I moved the creation of mailbox objects from the new API to the first part (which allows me to eliminate makeDisplayAddress, FWIW). This implies that a lot of later patches are to be horribly bitrotten, but it should fix more issues up front.
This should fix most of the comments to date; if I've missed any, please point them out to me. I'll upload updated versions of patches that are subsequently bitrotten later.
Attachment #733655 - Attachment is obsolete: true
Attachment #733655 - Flags: ui-review?(bwinton)
Attachment #733655 - Flags: superreview?(neil)
Attachment #733655 - Flags: review?(mconley)
Attachment #740650 - Flags: ui-review?(bwinton)
Attachment #740650 - Flags: superreview?(neil)
Attachment #740650 - Flags: review?(mconley)
Attachment #740650 - Flags: review?(iann_bugzilla)
Attachment #735721 - Attachment is obsolete: true
Attachment #735721 - Flags: review?(neil)
Attachment #741049 - Flags: review?(neil)
The code in nsSmtpProtocol had some massive bitrot.
Attachment #735722 - Attachment is obsolete: true
Attachment #735722 - Flags: review?(irving)
Attachment #741050 - Flags: review?(irving)
Yay for self-bitrot? Parts 3b-3g shouldn't have any issues applying.
Attachment #735733 - Attachment is obsolete: true
Attachment #735733 - Flags: review?(neil)
Attachment #741051 - Flags: review?(neil)
The big change here is that some changes were moved to part 1.
Attachment #735735 - Attachment is obsolete: true
Attachment #735735 - Flags: superreview?(neil)
Attachment #735735 - Flags: review?(irving)
Attachment #741052 - Flags: superreview?(neil)
Attachment #741052 - Flags: review?(irving)
Comment on attachment 741049 [details] [diff] [review]
Part 2: Implement a C++ API for nsIMsgHeaderParser

Review of attachment 741049 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/public/MimeHeaderParser.h
@@ +60,5 @@
> + * @return             The original header with duplicate addresses removed.
> + */
> +void RemoveDuplicateAddresses(const nsACString &aHeader,
> +                              const nsACString &aOtherEmails,
> +                              nsACString &result);

Are these strings explicitly ASCII, or are they UTF-8 or other?

@@ +67,5 @@
> + * Given an RFC 2047-decoded message header, extract all names and email
> + * addresses found in that header into the two arrays.
> + */
> +void ExtractAllAddresses(const nsAString &aHeader, nsTArray<nsString> &names,
> +                         nsTArray<nsCString> &emails);

This overload extracts names to nsString, the one that takes non-decoded headers as input extracts to nsCString (actually UTF-8) - do they need to be inconsistent? Similarly for overloads below.

@@ +79,5 @@
> + * strings and are suitable for display purposes.
> + */
> +void ExtractAllAddresses(const nsACString &aHeader, nsTArray<nsCString> &names,
> +                         nsTArray<nsCString> &emails,
> +                         const char *aCharset = nullptr);

I'm conflicted about using the same overloaded function name to operate on different data - strikes me as easy to make encoded/decoded mistakes. On the other hand, I don't want the function name to go halfway across the screen ;-> I'm not coming up with a good & compact naming pattern off the top of my head, but I'd prefer to have the naming make it more obvious what takes decoded vs. encoded input.
Comment on attachment 741050 [details] [diff] [review]
Part 3a: Use the C++ API in compose

Review of attachment 741050 [details] [diff] [review]:
-----------------------------------------------------------------

Do you mind fixing compile warnings while you're in these files?

/Users/ireid/tbird/comm-central/mailnews/compose/src/nsMsgCompose.cpp:4587:7: warning: variable 'rv'
      is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
  if (NS_SUCCEEDED(parentDir->GetChildNodes(getter_AddRefs(subDirectories))) && subDirectories)
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../mozilla/dist/include/nsError.h:182:33: note: expanded from macro 'NS_SUCCEEDED'
#define NS_SUCCEEDED(_nsresult) ((bool)MOZ_LIKELY(!NS_FAILED_impl(_nsresult)))
                                ^
/Users/ireid/tbird/comm-central/mailnews/compose/src/nsMsgCompose.cpp:4608:10: note: uninitialized
      use occurs here
  return rv;
         ^~
/Users/ireid/tbird/comm-central/mailnews/compose/src/nsMsgCompose.cpp:4587:3: note: remove the 'if'
      if its condition is always true
  if (NS_SUCCEEDED(parentDir->GetChildNodes(getter_AddRefs(subDirectories))) && subDirectories)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/ireid/tbird/comm-central/mailnews/compose/src/nsMsgCompose.cpp:4582:3: note: variable 'rv' is
      declared here
  nsresult rv;
  ^


/Users/ireid/tbird/comm-central/mailnews/compose/src/nsMsgSend.cpp:4970:1: warning: delete called on
      'nsMsgAttachmentData' that has virtual functions but non-virtual destructor
      [-Wdelete-non-virtual-dtor]
NS_IMPL_ISUPPORTS1(nsMsgAttachmentData, nsIMsgAttachmentData)
^
../../../mozilla/dist/include/nsISupportsImpl.h:1226:3: note: expanded from macro
      'NS_IMPL_ISUPPORTS1'
  NS_IMPL_RELEASE(_class)                                                     \
  ^
../../../mozilla/dist/include/nsISupportsImpl.h:510:40: note: expanded from macro 'NS_IMPL_RELEASE'
  NS_IMPL_RELEASE_WITH_DESTROY(_class, delete (this))
                                       ^
../../../mozilla/dist/include/nsISupportsImpl.h:490:5: note: expanded from macro
      'NS_IMPL_RELEASE_WITH_DESTROY'
    _destroy;                                                                 \
    ^
/Users/ireid/tbird/comm-central/mailnews/compose/src/nsMsgSend.cpp:5078:1: warning: delete called on
      'nsMsgAttachedFile' that has virtual functions but non-virtual destructor
      [-Wdelete-non-virtual-dtor]
NS_IMPL_ISUPPORTS1(nsMsgAttachedFile, nsIMsgAttachedFile)
^
../../../mozilla/dist/include/nsISupportsImpl.h:1226:3: note: expanded from macro
      'NS_IMPL_ISUPPORTS1'
  NS_IMPL_RELEASE(_class)                                                     \
  ^
../../../mozilla/dist/include/nsISupportsImpl.h:510:40: note: expanded from macro 'NS_IMPL_RELEASE'
  NS_IMPL_RELEASE_WITH_DESTROY(_class, delete (this))
                                       ^
../../../mozilla/dist/include/nsISupportsImpl.h:490:5: note: expanded from macro
      'NS_IMPL_RELEASE_WITH_DESTROY'
    _destroy;                                                                 \
    ^

::: mailnews/compose/src/nsMsgCompFields.cpp
@@ +528,5 @@
> +    if (!aEmailAddressOnly)
> +      MakeDisplayAddress(NS_ConvertUTF8toUTF16(names[i]),
> +        NS_ConvertUTF8toUTF16(address), recipient);
> +    else
> +      CopyUTF8toUTF16(address, recipient);

Octalia and Hexidecalia are united in the Great Struggle.

@@ +552,5 @@
> +  for (uint32_t i = 0; i < numAddresses; ++i)
> +  {
> +    NS_ConvertUTF8toUTF16 utf16Address(addresses[i]);
> +    nsString fullAddress;
> +    MakeMimeAddress(NS_ConvertUTF8toUTF16(names[i]), utf16Address, fullAddress);

Octalia has always been at war with Hexadecalia.

::: mailnews/compose/src/nsMsgCompUtils.cpp
@@ +1347,5 @@
>    if (!string || !*string)
>      return 0;
>  
>    if (addr_p) {
> +    return strdup(string);

Don't suppose the API of this function could change to nsACstring&...

::: mailnews/compose/src/nsMsgSend.cpp
@@ +3036,2 @@
>    // Remove duplicate addresses in recipients
>    nsCString recipients_no_dups;

can these be nsAutoCString?

::: mailnews/compose/src/nsSmtpProtocol.cpp
@@ +1513,5 @@
>      }
>      else
>      {
>        buffer = "RCPT TO:<";
> +      buffer += address.get();

The previous change used "buffer += address;" - is the .get() necessary?
Attachment #741050 - Flags: review?(irving) → review+
Comment on attachment 735728 [details] [diff] [review]
Part 3d: Use the C++ API in mime

Review of attachment 735728 [details] [diff] [review]:
-----------------------------------------------------------------

Again, warnings that aren't your fault but could use attention:

/Users/ireid/tbird/comm-central/mailnews/mime/emitters/src/nsMimePlainEmitter.h:25:19: warning:
      'nsMimePlainEmitter::EndHeader' hides overloaded virtual function [-Woverloaded-virtual]
    NS_IMETHOD    EndHeader();
                  ^
/Users/ireid/tbird/comm-central/mailnews/mime/emitters/src/nsMimeBaseEmitter.h:62:3: note: hidden
      overloaded virtual function 'nsMimeBaseEmitter::EndHeader' declared here
  NS_DECL_NSIMIMEEMITTER
  ^
../../../../mozilla/dist/include/nsIMimeEmitter.h:185:14: note: expanded from macro
      'NS_DECL_NSIMIMEEMITTER'
  NS_IMETHOD EndHeader(const nsACString & name); \
             ^
1 warning generated.
nsMimeHtmlEmitter.cpp
mimecryp.cpp
In file included from /Users/ireid/tbird/comm-central/mailnews/mime/emitters/src/nsMimeXmlEmitter.cpp:8:
/Users/ireid/tbird/comm-central/mailnews/mime/emitters/src/nsMimeXmlEmitter.h:27:19: warning:
      'nsMimeXmlEmitter::EndHeader' hides overloaded virtual function [-Woverloaded-virtual]
    NS_IMETHOD    EndHeader();
                  ^
/Users/ireid/tbird/comm-central/mailnews/mime/emitters/src/nsMimeBaseEmitter.h:62:3: note: hidden
      overloaded virtual function 'nsMimeBaseEmitter::EndHeader' declared here
  NS_DECL_NSIMIMEEMITTER
  ^
../../../../mozilla/dist/include/nsIMimeEmitter.h:185:14: note: expanded from macro
      'NS_DECL_NSIMIMEEMITTER'
  NS_IMETHOD EndHeader(const nsACString & name); \
             ^


/Users/ireid/tbird/comm-central/mailnews/mime/src/mimecms.cpp:29:1: warning: suggest braces around
      initialization of subobject [-Wmissing-braces]
MimeDefClass(MimeEncryptedCMS, MimeEncryptedCMSClass,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimei.h:218:17: note: expanded from macro
      'MimeDefClass'
 CTYPE CVAR = { cpp_stringify(ITYPE), sizeof(ITYPE), \
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimei.h:211:26: note: expanded from macro
      'cpp_stringify'
#define cpp_stringify(x) cpp_stringify_noop_helper(x)
                         ^
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimei.h:210:37: note: expanded from macro
      'cpp_stringify_noop_helper'
#define cpp_stringify_noop_helper(x)#x
                                    ^
<scratch space>:8:1: note: expanded from macro '#'
"MimeEncryptedCMS"
^

/Users/ireid/tbird/comm-central/mailnews/mime/src/mimedrft.cpp:467:5: warning: delete called on
      'nsMsgAttachedFile' that has virtual functions but non-virtual destructor
      [-Wdelete-non-virtual-dtor]
    delete attachments[i];
    ^
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimedrft.cpp:460:21: warning: comparison of
      integers of different signs: 'int' and 'size_type' (aka 'unsigned int') [-Wsign-compare]
  for (int i = 0; i < attachments.Length(); i++)
                  ~ ^ ~~~~~~~~~~~~~~~~~~~~
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimedrft.cpp:1566:3: warning: delete called on
      'nsMsgAttachedFile' that has virtual functions but non-virtual destructor
      [-Wdelete-non-virtual-dtor]
  delete mdd->messageBody;
  ^
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimedrft.cpp:1568:21: warning: comparison of
      integers of different signs: 'int' and 'size_type' (aka 'unsigned int') [-Wsign-compare]
  for (int i = 0; i < mdd->attachments.Length(); i++)
                  ~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/ireid/tbird/comm-central/mailnews/mime/src/mimedrft.cpp:1923:22: warning: comparison of
      integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int32_t' (aka 'int')
      [-Wsign-compare]
    if (bytesWritten < size)
        ~~~~~~~~~~~~ ^ ~~~~

::: mailnews/mime/src/mimecms.cpp
@@ +134,5 @@
>    }
>    return false;
>  }
>  
>  // extern MimeMessageClass mimeMessageClass;      /* gag */

Might as well remove this

@@ +142,2 @@
>  {
> +  mozilla::mailnews::ExtractFirstAddress(nsDependentCString(line), name, address);

Can we just get rid of this method, or is it used by extensions?

::: mailnews/mime/src/mimedrft.cpp
@@ +626,5 @@
>  
>  /* given an address string passed though parameter "address", this one will be converted
>     and returned through the same parameter. The original string will be destroyed
>  */
> +static void UnquoteMimeAddress(char **address, const char *charset)

I know this is how you found the API, but it's terrible. Can we change this to take nsACstring modified in place, or separate input and output parameters?

@@ +635,5 @@
> +    mimeHeader.Adopt(*address);
> +
> +    nsAutoCString fullAddress;
> +    nsTArray<nsCString> addresses;
> +    // XXX: charset

Leftover comment, or is there still something to fix here?
Attachment #735728 - Flags: review?(irving) → review+
Comment on attachment 741052 [details] [diff] [review]
Part 5: Introduce the new API to nsIMsgHeaderParser

Review of attachment 741052 [details] [diff] [review]:
-----------------------------------------------------------------

I think the jsval implementation of the email address / group union is the least bad solution, given XPCOM's lack of support for array attributes.
Attachment #741052 - Flags: review?(irving) → review+
(In reply to :Irving Reid from comment #35)
> ::: mailnews/mime/public/MimeHeaderParser.h
> @@ +60,5 @@
> > + * @return             The original header with duplicate addresses removed.
> > + */
> > +void RemoveDuplicateAddresses(const nsACString &aHeader,
> > +                              const nsACString &aOtherEmails,
> > +                              nsACString &result);
> 
> Are these strings explicitly ASCII, or are they UTF-8 or other?

These are UTF-8. 

> @@ +67,5 @@
> > + * Given an RFC 2047-decoded message header, extract all names and email
> > + * addresses found in that header into the two arrays.
> > + */
> > +void ExtractAllAddresses(const nsAString &aHeader, nsTArray<nsString> &names,
> > +                         nsTArray<nsCString> &emails);
> 
> This overload extracts names to nsString, the one that takes non-decoded
> headers as input extracts to nsCString (actually UTF-8) - do they need to be
> inconsistent? Similarly for overloads below.

As your later comments say, we are in a war between Octalia and Hexalia here. Descriptively, almost everyone who has an nsAString wants an nsAString name and almost everyone who has an nsACString wants nsACString results. The biggest inconsistency is in getting the emails--some people want it as an nsCString anyways while others want it as nsString.

> I'm conflicted about using the same overloaded function name to operate on
> different data - strikes me as easy to make encoded/decoded mistakes. On the
> other hand, I don't want the function name to go halfway across the screen
> ;-> I'm not coming up with a good & compact naming pattern off the top of my
> head, but I'd prefer to have the naming make it more obvious what takes
> decoded vs. encoded input.

With one exception (namely nsMsgCompFields::SplitRecipients*, may it rot forever), everyone who has an nsAString of a header got it by calling Mime2Decoded*.

(In reply to :Irving Reid from comment #37)
> Again, warnings that aren't your fault but could use attention:

My general policy on warnings is to fix them if I'm already around that code and they're simple enough to fix...

> /Users/ireid/tbird/comm-central/mailnews/mime/src/mimecms.cpp:29:1: warning:
> suggest braces around
>       initialization of subobject [-Wmissing-braces]
> MimeDefClass(MimeEncryptedCMS, MimeEncryptedCMSClass,
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

... and this is one of the less trivial ones to fix (I have to unroll that magic macro).

> /Users/ireid/tbird/comm-central/mailnews/mime/src/mimedrft.cpp:467:5:
> warning: delete called on
>       'nsMsgAttachedFile' that has virtual functions but non-virtual
> destructor
>       [-Wdelete-non-virtual-dtor]
>     delete attachments[i];
>     ^

Judicious use of extra MOZ_FINAL should be sufficient to silence these.

> ::: mailnews/mime/src/mimedrft.cpp
> @@ +626,5 @@
> >  
> >  /* given an address string passed though parameter "address", this one will be converted
> >     and returned through the same parameter. The original string will be destroyed
> >  */
> > +static void UnquoteMimeAddress(char **address, const char *charset)
> 
> I know this is how you found the API, but it's terrible. Can we change this
> to take nsACstring modified in place, or separate input and output
> parameters?

inout parameters are fun. That said, it looks like a lot of the surrounding code could really use some healthy nsCString-ification.

> @@ +635,5 @@
> > +    mimeHeader.Adopt(*address);
> > +
> > +    nsAutoCString fullAddress;
> > +    nsTArray<nsCString> addresses;
> > +    // XXX: charset
> 
> Leftover comment, or is there still something to fix here?

Oh, leftover.

(In reply to :Irving Reid from comment #36)
> ::: mailnews/compose/src/nsMsgCompUtils.cpp
> @@ +1347,5 @@
> >    if (!string || !*string)
> >      return 0;
> >  
> >    if (addr_p) {
> > +    return strdup(string);
> 
> Don't suppose the API of this function could change to nsACstring&...

[grep]. Few enough uses that it's a simple stringification change.
I have to admit I am a little slow getting into this. But I have a hard time understanding, as a potential use of these interfaces, how changes like this are viewed as an improvement:

-        var fromMailboxes = MailServices.headerParser.extractHeaderAddressMailboxes(
-            currentHeaderData.from.headerValue);
+        let fromMailboxes = [x.email for (x of
+          MailServices.headerParser.parseDecodedHeader(
+            currentHeaderData.from.headerValue))].join(",");

OK I suppose I am supposed to read all about it to understand it. But can you explain to me what we are gaining by adding all of this complexity? The use of jsval just makes it even harder to understand. That is not commonly used in mailnews code, and now you are asking me to understand something new just to decode a header.

Is there a short answer as to why all of this new complexity is needed?
(In reply to Kent James (:rkent) from comment #40)
> OK I suppose I am supposed to read all about it to understand it. But can
> you explain to me what we are gaining by adding all of this complexity? The
> use of jsval just makes it even harder to understand. That is not commonly
> used in mailnews code, and now you are asking me to understand something new
> just to decode a header.
> 
> Is there a short answer as to why all of this new complexity is needed?

The main purpose of removing most of the extract* methods is to clearly distinguish between cases where the value has already been RFC 2047-decoded and the cases where it has not. In an ideal world, we'd store the 2047-decoded form in the database and not need to bother with decoding/encoding except when parsing/emitting data, but that ship long ago sailed.

The other methods are designed to reflect a more accurate model of the format of an addressing header (i.e., it allows for the existence of groups; the fact that the old header parser poorly supported these forms causes interesting contortions in our code).
I've been trying to make some progress in reviews here, but I can't get a version to compile. The all patch of 4/1 seems to require patches (bug 856914?) that landed after 4/1.  I've tried a couple of fixes myself, but it would be nice if you could provide a non-bit rotted all patch that will compile.
Attachment #732176 - Attachment is obsolete: true
Comment on attachment 740650 [details] [diff] [review]
Part 1: Distinguish between building strings for display or MIME

Review of attachment 740650 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing in here strikes me as unreasonable, although I'll admit that certain chunks of it go over my head. Assuming Joshua was interested in me checking out the interactions with the address book stuff though, it all looks good.

::: mailnews/addrbook/src/nsAbLDAPAutoCompFormatter.cpp
@@ -41,5 @@
>                                    nsIAutoCompleteItem **aItem) 
>  {
>      nsresult rv;
>  
> -    nsCOMPtr<nsIMsgHeaderParser> msgHdrParser = 

Ahhh, deCOMtamination... feels good. :)

::: mailnews/addrbook/src/nsAbLDAPAutoCompleteSearch.js
@@ +119,5 @@
>      });
>    },
>  
>    _addToResult: function _addToResult(card) {
> +    var emailAddress = this._parser.makeMailboxObject(card.displayName,

let instead of var, please.
Attachment #740650 - Flags: review?(mconley) → review+
(In reply to Joshua Cranmer [:jcranmer] from comment #29)
> (In reply to Ian Neal from comment #25)
> > The old code used to deal properly with lines containing comma separated
> > entries, the new code does not. If you drag two existing entries onto the
> > same line separated by a comma, it will mangle the display name in the
> > second entries card in the address book.
> 
> It actually turns out, since the dialog is modal, that you can't drag to it
> on Windows, but you can on Linux. And the more I dig into this code, the
> more I have to restrain myself from rewriting the entire UI of Thunderbird.
> Lots of swearing is in order here, but I have a fix for this in a new patch
> I'll be uploading shortly.
Did the latest Patch 1 contain this fix, as it still seems to exhibit the same behaviour for me?
Looking at "Part 3f: Use the C++ API in db", the essence of this seems to be eliminating the decoding of addresses in nsParseMailMessageState::FinalizeHeaders() before they are stored in the msgHdr, for example here:

       if (ccList)
       {
-        uint32_t numAddresses;
-        char  *names;
-        char  *addresses;
-
-        ret = m_HeaderAddressParser->ParseHeaderAddresses(ccList->value,
-                                                          &names, &addresses,
-                                                          &numAddresses);
-        if (NS_SUCCEEDED(ret) && numAddresses > 0)
-        {
-          m_newMsgHdr->SetCCListArray(names, addresses, numAddresses);
-          PR_Free(addresses);
-          PR_Free(names);
-        }
-        else  // hmm, should we just use the original string?
-          m_newMsgHdr->SetCcList(ccList->value);
+        m_newMsgHdr->SetCcList(ccList->value);
       }

Maybe I am missing something, but doesn't that leave RFC 2047 encoded values in the msgHdr instead of decoded values? If so, that has pretty wide ramifications for people who are using those values.
(In reply to Kent James (:rkent) from comment #45)
> Maybe I am missing something, but doesn't that leave RFC 2047 encoded values
> in the msgHdr instead of decoded values? If so, that has pretty wide
> ramifications for people who are using those values.

ParseHeaderAddresses does not do any RFC 2047 decoding (only parseHeadersWithArray does that, wonderful API incongruences).
Comment on attachment 740650 [details] [diff] [review]
Part 1: Distinguish between building strings for display or MIME

Well, I tested out the try build, and I couldn't seem to break it.  (It also handled "A Group: a@b.c, Blake <bwinton@foo.com>" correctly!  ;)

Thanks,
Blake.
Attachment #740650 - Flags: ui-review?(bwinton) → ui-review+
Comment on attachment 740650 [details] [diff] [review]
Part 1: Distinguish between building strings for display or MIME

I still have the issue on Linux using an existing mailing list.
STR
1/ Have a mailing list with a couple of entries
2/ Edit mailing list
3/ Drag and drop an entry onto an empty line in the mailing list, manually type a comma and then drag and drop another entry onto the same line.
4/ Close mailing list dialog and review entries in addressbook

Expected result
1/ No change to the display name in any of the entries

Actual result
1/ Display name for the second entry added now contains display name, email address and comma as well as only display name.
Attachment #740650 - Flags: review?(iann_bugzilla) → review-
Comment on attachment 735730 [details] [diff] [review]
Part 3f: Use the C++ API in db

This bug is essentially a reversal of a change that was done way early in the history of Thunderbird (to parse the headers after they are received, and then reassemble them into an array). I've tried to figure out if there is any good reason to do that, and can't come up with a reason, so I'll just accept the change. Other details of the patch are all fine.
Attachment #735730 - Flags: review?(kent) → review+
Comment on attachment 735724 [details] [diff] [review]
Part 3b: Use the C++ API in search

I cannot really complete this review until the issue of overloading of ExtractAllAddresses is resolved as mentioned in Comment 35. For the record, I don't think it is a good idea to simply use the difference of type between a String and CString to determine whether RFC 2047 decoding is done, so I would much prefer that be changed.

On another issue, charsetOverride used to be propagated through the searches, but that has been removed in this patch. What is the justification for that?
(In reply to Kent James (:rkent) from comment #50)
> On another issue, charsetOverride used to be propagated through the
> searches, but that has been removed in this patch. What is the justification
> for that?

I should have explained this part earlier, my apologies.

In the context of header decoding, charsetOverride being true means "ignore the declared charset in the 2047 decoding word and use this other charset instead, always." My plans for reimplementing this in JSMime is to kill this functionality, since the use of RFC 2047 encoded words is a fairly explicit opt-in; thus, it is hard to accidentally slap on the wrong label. Since those attributes are going to become a nop anyways, I went about killing it on APIs I was already touching.
Attached patch New C++ API interdiff (obsolete) — Splinter Review
This is a patch for a new C++ API I've been playing with to solve the issue brought up in comment 35. From a user's perspective, a call looks like this:

ExtractMailboxes(DecodedHeader(to), emails);

or

ExtractDisplayAddresses(EncodedHeader(to, charset), addresses);

[These are all in the mozilla::mailnews namespace, but liberal use of using namespace mozilla::mailnews; fixes that].

Arrays can either be nsTArray<nsCString> or nsTArray<nsString> as out parameters, as well; a magic temporary handles the UTF conversion as appropriate.

This patch is currently applied on top of the last patch in my queue that touches MimeHeaderParser.cpp (part 2 of the next bug), and its design is unfortunately ill-suited for the pre-part 5 version of the interface.

Still, for people who have expressed concerns about the API issues, what do you think?
As I look at the issues in attachment 754183 [details] [diff] [review], the main proposal as I understand it is to solve the issues of comment 35 by using C++ type checking, and wrapping all uses of ambiguous encoded/decoded headers with a required EncodedHeader or DecodedHeader wrapper. This is in fact only for clarification, as the underlying separation of ACString->encoded and AString->decoded remains (but becomes an implementation detail).

Reading through the code, I like the approach. That is, as I look at examples like:

     nsString fullHeader;
     if (NS_FAILED(aHdr->GetMime2DecodedAuthor(fullHeader)))
       return false;
 
-    mozilla::mailnews::ExtractName(fullHeader, author);
+    ExtractName(DecodedHeader(fullHeader), author);

then I immediately find myself asking the question "is fullHeader decoded or not?", which is a good thing from a correctness point of view. The strong typing forces that question.

I still have some misgivings about the overall philosophy of this patch, which seems to be developing a lot of parallel code to provide optimal Javascript and C++ implementations, even though looking narrowly at the javascript or C++ side I can see the advantages. In the process, you are introducing concepts in both the C++ and Javascript side that, while legal, are not common mailnews coding practice, and therefore force the poor user (such as myself) to have to understand new concepts to debug things (like jsval in .idl files, and "magic temporaries" in the new C++ files). I've been on the opposite side of this argument before (like trying to allow use of do {} while false; loops for error management), but I'd like to understand that there is a significant benefit that justifies both the need to maintain parallel implementations, and the need to learn new concepts. So far, I find myself not looking forward to having to learn new things just to do simple operations, rather than looking forward to how this is going to make my life easier. But I am only one opinion here, and it seems like the horse has already left the barn, so I will leave my state as "having misgivings" rather than actively opposing the overall approach. (In my own code, which needs to freely move from C++ to javascript, I find myself using XPCOM more than core code does, so I am clearly out of step a little with the deCOMtamination movement in Mozilla).

But for the immediate issue of comment 35, the approach in the current attachments successfully addresses the issue.
(In reply to Ian Neal from comment #44)
> (In reply to Joshua Cranmer [:jcranmer] from comment #29)
> > (In reply to Ian Neal from comment #25)
> > > The old code used to deal properly with lines containing comma separated
> > > entries, the new code does not. If you drag two existing entries onto the
> > > same line separated by a comma, it will mangle the display name in the
> > > second entries card in the address book.
> > 
> > It actually turns out, since the dialog is modal, that you can't drag to it
> > on Windows, but you can on Linux. And the more I dig into this code, the
> > more I have to restrain myself from rewriting the entire UI of Thunderbird.
> > Lots of swearing is in order here, but I have a fix for this in a new patch
> > I'll be uploading shortly.
> Did the latest Patch 1 contain this fix, as it still seems to exhibit the
> same behaviour for me?

After some more careful investigation, I have come to the following realization: the "drag and drop" here appears to be the Linux window manager handling it is yet another form of text entry (if you position the drop target just right, you can get the text to "drop" in the middle of the internal string). One of the goals I had in the redesign was to strictly limit each line in the dialog box to a single display name pair--that makes a lot of quoting issues for display names a lot easier.

Since this UX is impossible on one of our major platforms, would you be fine with just dropping support for this?
Flags: needinfo?(iann_bugzilla)
Dropping support I presume you mean dropping support for adding multiple addresses to a mailing list on a single line? If so, need to stop a user from doing this, how easy is that?
Flags: needinfo?(iann_bugzilla)
Allowing and handling multiple addresses on a single line (iirc via comma or semi-colon) was something added "recently", i.e. post 2.0, can't quite remember when. Given that users can type in a single line with commas (or copy and paste, not just drag), I think we need to keep supporting it.
In lieu of Standard8's comments, I've decided to add back support in for multiple address per line. Here is the interdiff between the two versions of the patch...
... and the updated patch itself.
Attachment #740650 - Attachment is obsolete: true
Attachment #740650 - Flags: superreview?(neil)
Attachment #798696 - Flags: superreview?(neil)
Attachment #798696 - Flags: review?(iann_bugzilla)
And an updated version of part 2, that takes into account the new C++ API, akin to attachment 754183 [details] [diff] [review]
Attachment #741049 - Attachment is obsolete: true
Attachment #754183 - Attachment is obsolete: true
Attachment #741049 - Flags: review?(neil)
Attachment #798697 - Flags: review?(neil)
While I think the changes here from the last version aren't all that different (just applying the portion of the new API several patches earlier), this is still by far the messiest set of part 3 changes, so at least getting another look-over would be helpful.
Attachment #741050 - Attachment is obsolete: true
Attachment #798698 - Flags: review?(irving)
Attachment #735724 - Attachment is obsolete: true
Attachment #735724 - Flags: review?(kent)
Attachment #798699 - Flags: review?(kent)
A few general drive-by comments as I try to review this again:

 interface nsIMsgHeaderParser : nsISupports {
+  /**
+   * Make a single instance of a parsed address array representing a mailbox.
+   *
+   * Objects created by this method will act like the mailbox objects returned
+   * from the parse*Header methods, including having the toString method.
+   */
+  msgIAddressObject makeMailboxObject(in AString aName, in AString aEmail);

I have no idea what this means: "single instance of a parsed address array representing a mailbox." The description of the interface itself "An element of the addressing arrays" doesn't really help either. I guess I have no idea what an "addressing array" is so these descriptions don't help, nor do I understand the significance of this being "parsed". "mailbox objects returned from the parse*Header methods" assumes that the reader understands in depth some of the implementation details of addressing, just so that she can understand the definition of what should be a fairly simple object.

How about "Single object combining the name and address portion of a mailbox, useful for making arrays of mailbox items". And why doesn't "makeMailboxObject" make an msqIMailboxObject?

The description also needs to make it clear whether these objects have any encoding, either expected in the inputs or present in the outputs.
Another drive-by comment:

+/**
+ * Given a raw message header value, extract all the mailboxes into an array.
+ *
+ * All duplicate email addresses are removed from the output list.
+ */
+void ExtractMailboxes(const ParsedHeader &aHeader, AnyStringArray emails);

So now an "email" is the same as a "mailbox"? RFC 5322 defines:

  "Normally, a mailbox is composed of two parts: (1) an optional display
   name that indicates the name of the recipient (which can be a person
   or a system) that could be displayed to the user of a mail
   application, and (2) an addr-spec address enclosed in angle brackets"

so I don't think that you should be using the term "mailbox" as a synonym for "address spec" as opposed to the "display name". I really think that you should strive to consistently use the term "mailbox" to mean the combination of an (optional) display name and an address spec, which would mean that "makeMailbboxObject" should make an msgIMailboxObject for consistency, and the above "ExtractMailboxes" is misnamed.
(In reply to Kent James (:rkent) from comment #62)
> I have no idea what this means: "single instance of a parsed address array
> representing a mailbox." The description of the interface itself "An element
> of the addressing arrays" doesn't really help either. I guess I have no idea
> what an "addressing array" is so these descriptions don't help, nor do I
> understand the significance of this being "parsed". "mailbox objects
> returned from the parse*Header methods" assumes that the reader understands
> in depth some of the implementation details of addressing, just so that she
> can understand the definition of what should be a fairly simple object.

So, part of the problem is that I originally added these methods in the patch for part 5, and after one cycle of review comments, I decided to move them to part 1 retaining their documentation at the time. Although, on careful reading of my path for part 5 again, I think the documentation is still confusing. (Note that nsIMsgHeaderParser, as of part 5, consists largely of parseEncodedHeader and parseDecodedHeader methods plus some utility stuff).

> How about "Single object combining the name and address portion of a
> mailbox, useful for making arrays of mailbox items". And why doesn't
> "makeMailboxObject" make an msqIMailboxObject?

This is where things get complicated. Structurally speaking, the new API is designed around manipulating addressing headers in either their structural form (which is to say, a list of either mailbox or group objects) or their MIME form, and is biased towards making JS use easier. I was eventually persuaded to use XPIDL arrays of msgIAddressObject ("address" being the RFC 5322 term for either-a-group-or-a-mailbox), the idea that one of these objects can either be a mailbox object or a group object. Since no users of C++ ever query a group object, I opted to drop pretense of group objects on the IDL for now.

> The description also needs to make it clear whether these objects have any
> encoding, either expected in the inputs or present in the outputs.

Ideally, RFC 2047 encoding would only be leaked to a small portion of the codebase. But since our database stores everything as 2047-encoded words (yuck), I have to have a helper on this interface. I'll tighten up the documentation in part 5 to make it clear that parseEncodedHeader is the only thing that knows about RFC 2047 and everyone else assumes encoded words don't exist.

(In reply to Kent James (:rkent) from comment #63)
> +void ExtractMailboxes(const ParsedHeader &aHeader, AnyStringArray emails);
> 
> So now an "email" is the same as a "mailbox"? RFC 5322 defines:

Oh dear. That is a serious lapse on my part. I'll rename it to ExtractEmails...
Comment on attachment 798696 [details] [diff] [review]
Part 1: Distinguish between building strings for display or MIME

>+          decodedName = NS_ConvertUTF8toUTF16(pNames);
CopyUTF8toUTF16(nsDependentCString(pNames), decodedName);

>+        recipient = NS_ConvertUTF8toUTF16(pAddresses);
Ditto.

(NS_ConvertUTF8toUTF16 is a class, so it creates an nsAutoString temporary which then gets assigned to decodedName. This is inefficient.)

>+        decodedName = nsDependentCString(pNames);
Dependent strings don't work like that. The readable solution is to simply assign, i.e. decodedName.Assign(pNames);
The unreadable solution is to declare nsDependentCString decodedName(pNames) and then if you have something better you can assign it in its place.
The sneaky solution is to copy bz's Rebind from nsCString_internal to nsCString_external and use that instead.
Attachment #798696 - Flags: superreview?(neil) → superreview+
Comment on attachment 798697 [details] [diff] [review]
Part 2: Implement a C++ API for nsIMsgHeaderParser

This code seems somewhat inefficient to me. For instance, consider ExtractName(DecodedHeader(header), name);
DecodedHeader calls ParseHeadersWithArray and converts the array of strings into MsgAddressObjects. (If the intent is that the new parser creates MsgAddressObjects directly, this isn't so bad).
ExtractName calls ExtractFirstAddress which calls ExtractAllAddresses which converts the MsgAddressObjects back into strings, and then AnyStringArray converts all the strings into UTF-8. It then returns the first name (or the first email if it has no name).
At the very least I think you need to ditch AnyStringArray. If you do need an nsTArray<nsCString> somewhere down the line then you could write a helper to convert the whole array to UTF-8.

>+struct ParsedHeader
>+{
>+  ~ParsedHeader();
>+  uint32_t mCount;
>+  msgIAddressObject **mAddresses;
>+};
This structure has a nontrivial destructor but neither constructor nor copy operator. This means that you need to be at the very least careful, if not lucky, that you don't free invalid memory. In particular you might be relying on copy elision e.g. NRVO.

>+  names.Clear();
>+  names.InsertElementsAt(0, count);
names.SetLength(count);
(similarly in other places)
Attachment #798697 - Flags: review?(neil) → review-
Attachment #741052 - Flags: superreview?(neil) → superreview+
Comment on attachment 741051 [details] [diff] [review]
Part 4: Kill [noscript] on nsIMsgHeaderParser

>-  nsCOMPtr<nsIMsgHeaderParser> headerParser = services::GetHeaderParser();
>+  nsMsgHeaderParser headerParser;
>   nsCOMPtr<nsIMimeConverter> converter = services::GetMimeConverter();
> 
>   nsCString nameBlob, emailBlob;
>   uint32_t count = 0;
>-  headerParser->ParseHeaderAddresses(PromiseFlatCString(aHeader).get(),
>+  headerParser.ParseHeaderAddresses(PromiseFlatCString(aHeader).get(),
>     getter_Copies(nameBlob), getter_Copies(emailBlob), &count);
I don't like the way you create an instance of an ostensibly refcounted object to call one of those three methods. Can they not be made static?
(In reply to neil@parkwaycc.co.uk from comment #66)
> Comment on attachment 798697 [details] [diff] [review]
> Part 2: Implement a C++ API for nsIMsgHeaderParser
> 
> This code seems somewhat inefficient to me. For instance, consider
> ExtractName(DecodedHeader(header), name);
> DecodedHeader calls ParseHeadersWithArray and converts the array of strings
> into MsgAddressObjects. (If the intent is that the new parser creates
> MsgAddressObjects directly, this isn't so bad).

The new parser will create them directly, see part 5.

> ExtractName calls ExtractFirstAddress which calls ExtractAllAddresses which
> converts the MsgAddressObjects back into strings, and then AnyStringArray
> converts all the strings into UTF-8. It then returns the first name (or the
> first email if it has no name).

The methods which extract only one name/email are almost always the From header, which in practice only has a single address anyways (it can have more per RFC 5322, but considering how much of our code assumes it only ever has one and the lack of bugs filed about it, I think it's safe to say that having more is extremely rare). The other methods, which only grab one of the arrays, are careful to use UTF-16 for the arrays they ignore, so no extra conversion goes on.

> At the very least I think you need to ditch AnyStringArray. If you do need
> an nsTArray<nsCString> somewhere down the line then you could write a helper
> to convert the whole array to UTF-8.

The problem is that so many people need to use nsTArray<nsCString>--more, I think, than those who need UTF-16. The bigger problem is that our entire infrastructure is indecisive as to when to use UTF-8 or when to use UTF-16--there are places where we end up converting the string back or forth at least 5 times due to changing APIs constantly.

> >+struct ParsedHeader
> >+{
> >+  ~ParsedHeader();
> >+  uint32_t mCount;
> >+  msgIAddressObject **mAddresses;
> >+};
> This structure has a nontrivial destructor but neither constructor nor copy
> operator. This means that you need to be at the very least careful, if not
> lucky, that you don't free invalid memory. In particular you might be
> relying on copy elision e.g. NRVO.

A deleted copy constructor and assignment operator along with a move constructor should solve the problem nicely.
(In reply to Joshua Cranmer from comment #68)
> (In reply to comment #66)
> > This code seems somewhat inefficient to me. For instance, consider
> > ExtractName(DecodedHeader(header), name);
> > DecodedHeader calls ParseHeadersWithArray and converts the array of strings
> > into MsgAddressObjects. (If the intent is that the new parser creates
> > MsgAddressObjects directly, this isn't so bad).
> 
> The new parser will create them directly, see part 5.

I don't see anything being created in part 5.

> > At the very least I think you need to ditch AnyStringArray. If you do need
> > an nsTArray<nsCString> somewhere down the line then you could write a helper
> > to convert the whole array to UTF-8.
> 
> The problem is that so many people need to use nsTArray<nsCString>--more, I
> think, than those who need UTF-16. The bigger problem is that our entire
> infrastructure is indecisive as to when to use UTF-8 or when to use
> UTF-16--there are places where we end up converting the string back or forth
> at least 5 times due to changing APIs constantly.

And there were still several back-and-forth conversions in a couple of your Part 3 patches.
Comment on attachment 798698 [details] [diff] [review]
Part 3a: Use the C++ API in compose

Review of attachment 798698 [details] [diff] [review]:
-----------------------------------------------------------------

Nice reduction in the need for 8<->16 conversions, unfortunate that we now need a few more EncodedHeader()s. I appreciate that we can't completely get rid of them without lots of breaking API changes.

::: mailnews/compose/src/nsMsgCompFields.cpp
@@ +508,5 @@
> +  if (aEmailAddressOnly)
> +    ExtractMailboxes(header, results);
> +  else
> +    ExtractDisplayAddresses(header, results);
> +  

White space

@@ +510,5 @@
> +  else
> +    ExtractDisplayAddresses(header, results);
> +  
> +  uint32_t count = results.Length();
> +  PRUnichar **result = (PRUnichar **)NS_Alloc(sizeof(PRUnichar *) * count);

Juggling between variables named "results" and "result" is confusing - can you rename one?
Attachment #798698 - Flags: review?(irving) → review+
Comment on attachment 798696 [details] [diff] [review]
Part 1: Distinguish between building strings for display or MIME

Just nits.

>+++ b/mail/components/compose/content/addressingWidgetOverlay.js
>@@ -285,23 +290,17 @@ function awSetInputAndPopup(inputValue, 
> 
> function awSetInputAndPopupFromArray(inputArray, popupValue, parentNode, templateNode)
> {
>   if (popupValue)
>   {
>     var recipient;
>     for (var index = 0; index < inputArray.length; index++)
>     {
>+      recipient = inputArray[index];
>       _awSetInputAndPopup(recipient, popupValue, parentNode, templateNode);
Nit: can you not just inline recipient and then braces are not needed either.
>     }


>+++ b/mailnews/addrbook/content/abMailListDialog.js
>       if (cardproperty)
>       {
>+        let addrObjects = MailServices.headerParser
>+                                      .makeFromDisplayAddress(fieldValue, {});
>+        for (let j = 0; j < addrObjects.length; j++) {
Nit: style within the file appears to be brace on the following line rather than the same line.
>           if (j > 0)
>           {

>@@ -530,22 +527,31 @@ function DropOnAddressListTree(event)

>+  for (let full of fullNames.value) {
Nit: style within the file appears to be brace on the following line rather than the same line.

>+++ b/mailnews/addrbook/src/nsAbAutoCompleteSearch.js
>   _addToResult: function _addToResult(commentColumn, directory, card,
>                                       emailToUse, isPrimaryEmail, result) {
>-    var emailAddress =
>-      this._parser.makeFullAddress(card.displayName,
>-                                   card.isMailList ?
>-                                   card.getProperty("Notes", "") || card.displayName :
>-                                   emailToUse);
>-
>-    // The old code used to try it manually. I think if the parser can't work
>-    // out the address from what we've supplied, then its busted and we're not
>-    // going to do any better doing it manually.
>-    if (!emailAddress)
>-      return;
>+    var emailAddress = this._parser.makeMailboxObject(card.displayName,
>+      card.isMailList ? card.getProperty("Notes", "") || card.displayName
>+                      : emailToUse).toString();
Urgh, this looks ugly, but it was not much prettier before. I think the previous style was slightly easier to read though.

>+++ b/mailnews/addrbook/src/nsAbLDAPAutoCompleteSearch.js
>@@ -105,25 +105,19 @@ nsAbLDAPAutoCompleteSearch.prototype = {
>     var lcEmailAddress = emailAddress.toLocaleLowerCase();
> 
>     return this._result._searchResults.some(function(result) {
>       return result.value.toLocaleLowerCase() == lcEmailAddress;
>     });
>   },
> 
>   _addToResult: function _addToResult(card) {
>+    let emailAddress = this._parser.makeMailboxObject(card.displayName,
>+      card.isMailList ? card.getProperty("Notes", "") || card.displayName
>+                      : card.primaryEmail).toString();
Shame that we have separate, similar code in nsAbAutoCompleteSearch as to this. In the future...
Same issue with the style as mentioned above.

>+++ b/suite/mailnews/compose/addressingWidgetOverlay.js

>     var recipient;
>     for (var index = 0; index < inputArray.length; index++)
>     {
>+      recipient = inputArray[index];
>       _awSetInputAndPopup(recipient, popupValue, parentNode, templateNode);
Nit: can you not just inline recipient and then braces are not needed either.
>     }

r=me with those addressed / commented on.
Attachment #798696 - Flags: review?(iann_bugzilla) → review+
Comment on attachment 798699 [details] [diff] [review]
Part 3b: Use the C++ API in search

Review of attachment 798699 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #798699 - Flags: review?(kent) → review+
(In reply to Ian Neal from comment #71)
> Comment on attachment 798696 [details] [diff] [review]
> Part 1: Distinguish between building strings for display or MIME
> 
> Just nits.
> 
> >+++ b/mail/components/compose/content/addressingWidgetOverlay.js
> >@@ -285,23 +290,17 @@ function awSetInputAndPopup(inputValue, 
> > 
> > function awSetInputAndPopupFromArray(inputArray, popupValue, parentNode, templateNode)
> > {
> >   if (popupValue)
> >   {
> >     var recipient;
> >     for (var index = 0; index < inputArray.length; index++)
> >     {
> >+      recipient = inputArray[index];
> >       _awSetInputAndPopup(recipient, popupValue, parentNode, templateNode);
> Nit: can you not just inline recipient and then braces are not needed either.

I decided to just use a for-of loop instead.

(In reply to neil@parkwaycc.co.uk from comment #69)
> (In reply to Joshua Cranmer from comment #68)
> > (In reply to comment #66)
> > > This code seems somewhat inefficient to me. For instance, consider
> > > ExtractName(DecodedHeader(header), name);
> > > DecodedHeader calls ParseHeadersWithArray and converts the array of strings
> > > into MsgAddressObjects. (If the intent is that the new parser creates
> > > MsgAddressObjects directly, this isn't so bad).
> > 
> > The new parser will create them directly, see part 5.
> 
> I don't see anything being created in part 5.

The API that it will use is in part 5. After some reflection, I'm making some major changes to part 4 and part 5 to use the new API earlier.

> And there were still several back-and-forth conversions in a couple of your
> Part 3 patches.

Mostly compose, as I recall, which is largely a casualty of always choosing the wrong encoding to use (it wants to store everything as UTF-16 but process it as UTF-8 and then hand it out as UTF-16...).
(In reply to Joshua Cranmer from comment #73)
> (In reply to comment #69)
> > (In reply to Joshua Cranmer from comment #68)
> > > (In reply to comment #66)
> > > > This code seems somewhat inefficient to me. For instance, consider
> > > > ExtractName(DecodedHeader(header), name);
> > > > DecodedHeader calls ParseHeadersWithArray and converts the array of strings
> > > > into MsgAddressObjects. (If the intent is that the new parser creates
> > > > MsgAddressObjects directly, this isn't so bad).
> > > 
> > > The new parser will create them directly, see part 5.
> > 
> > I don't see anything being created in part 5.
> 
> The API that it will use is in part 5.

I'm not following. When you said "The new parser will create them directly, see part 5," I assumed that part 5 would see the new parser creating them directly.
Attachment #798697 - Attachment is obsolete: true
Attachment #808935 - Flags: review?(neil)
Attachment #735726 - Attachment is obsolete: true
Attachment #808936 - Flags: review?(mbanner)
Attached patch Part 3e: Use the C++ API in base (obsolete) — Splinter Review
Attachment #735729 - Attachment is obsolete: true
Attachment #735729 - Flags: review?(neil)
Attachment #808938 - Flags: review?(neil)
Attachment #735731 - Attachment is obsolete: true
Attachment #735731 - Flags: review?(neil)
Attachment #808939 - Flags: review?(neil)
I folded parts 4 and 5 into a single patch, and gave a C++ implementation for most of it, merely excluding supports for groups because that would require a massive set of changes for something that would hopefully disappear shortly.
Attachment #741051 - Attachment is obsolete: true
Attachment #741052 - Attachment is obsolete: true
Attachment #741051 - Flags: review?(neil)
Attachment #808940 - Flags: superreview?(neil)
Attachment #808940 - Flags: review?(irving)
Comment on attachment 808940 [details] [diff] [review]
Part 4: Introduce the new API to nsIMsgHeaderParser

Review of attachment 808940 [details] [diff] [review]:
-----------------------------------------------------------------

r- for cramming a negative number as an error response into an unsigned integer; everything else looks OK to me (though I haven't been able to apply the patch stack to build, test & look for warnings/errors)

::: mailnews/mime/public/nsIMsgHeaderParser.idl
@@ +63,5 @@
> + * designed to take advantage of C++'s features, particularly with respect to
> + * arrays.
> + *
> + * There are two methods for parsing MIME headers, one where RFC 2047 decoding
> + * has already previously occurred and one where it has not. As convenience for

Wording is a bit awkward; "one for strings that have already been RFC 2047 decoded, and one for strings that have not been decoded" maybe?

@@ +66,5 @@
> + * There are two methods for parsing MIME headers, one where RFC 2047 decoding
> + * has already previously occurred and one where it has not. As convenience for
> + * JavaScript comprehensions on the result of these methods, if the header is
> + * empty, an array consisting of a mailbox object with an empty string for both
> + * the name and email is returned.

Not sure I like this; if it's legitimate for a parse to return 0 addresses I'd prefer that we return an empty array and require the caller to check for that condition.

@@ +190,1 @@
>                               [array, size_is(count)] out wstring aEmailAddresses,

Trailing space

::: mailnews/mime/src/nsMsgHeaderParser.cpp
@@ +166,5 @@
>    NS_ENSURE_ARG_POINTER(aNumAddresses);
>  
>    int status = msg_parse_Header_addresses(aLine, aNames, aAddresses);
>    if (status < 0)
>      return NS_ERROR_FAILURE;

Existing code, but should use NS_FAILED(status) and maybe rename to 'rv' or, if warnings in debug builds are appropriate, NS_ENSURE_SUCCESS.

@@ +1543,5 @@
> +
> +  nsCString nameBlob, emailBlob;
> +  uint32_t count;
> +  ParseHeaderAddresses(PromiseFlatCString(aHeader).get(),
> +    getter_Copies(nameBlob), getter_Copies(emailBlob), &count);

check 'count' for an error return, otherwise code below will do terrible things with a negative (== very large unsigned) count

::: mailnews/mime/src/nsMsgHeaderParser.h
@@ +51,5 @@
> +   *                       negative, there has been an error parsing the
> +   *                       header.
> +   */
> +  static nsresult ParseHeaderAddresses(const char *aLine, char **aNames,
> +                                       char **aAddresses, uint32_t *aNumAddresses);

Don't return negative numbers through a uint type to indicate failure. I know we do this in other places in mailnews, but it must be stopped.
Attachment #808940 - Flags: review?(irving) → review-
Comment on attachment 808936 [details] [diff] [review]
Part 3c: Use the C++ API in addrbook

Review of attachment 808936 [details] [diff] [review]:
-----------------------------------------------------------------

Not tested, but looks good.
Attachment #808936 - Flags: review?(mbanner) → review+
Comment on attachment 798696 [details] [diff] [review]
Part 1: Distinguish between building strings for display or MIME

>+  // This is a wasteful copy, but the internal API does not have RFindChar on
>+  // nsAString, only nsString.
>+  nsString display(aDisplay);
>+  // Strip leading/trailing whitespace
>+  MsgCompressWhitespace(display);
It's not a wasteful copy if you're immediately mutating the string anyway...
Comment on attachment 808935 [details] [diff] [review]
Part 2: Implement a C++ API for nsIMsgHeaderParser

>+struct ParsedHeader
It would be nice if you could use an nsCOMArray<msgIAddressObject>, it probably only needs a move constructor. Or you could just use an outparam...

>+class AnyStringArray
I really don't like this class. But a class that adapts an nsTArray<nsCString> to an nsTArray<nsString>& would be fine.

template<uint32_t N>
class Foo
{
  public:
    Foo(nsTArray<nsCString>& aUTF8Array)
      : mUTF8Array(aUTF8Array) {}
    ~Foo();
    operator nsTArray<nsString>&() { return mUTF16Array; }
  private:
    nsTArray<nsCString&> mUTF8Array;
    nsAutoTArray<nsString, N> mUTF16Array;
}

>+    retval.mAddresses[i] = new MsgAddressObject(utf16Name,
>+      NS_ConvertUTF8toUTF16(emailPtr));
The charset is applied to the name but not the address. Is the address really UTF8 or is it ASCII?

>+  // Prefill arrays before we start
>+  names.Clear();
>+  emails.Clear();
>+  names.InsertElementsAt(0, count);
>+  emails.InsertElementsAt(0, count);
If this was a real array you could use SetLength.

>+  if (count == 1 && names[0].IsEmpty() && emails[0].IsEmpty())
...
>+  if (count == 1 && names[0].IsEmpty())
Can the email be empty without the name being empty, or vice versa? Is it correct to behave differently in those cases?

>+  nsTArray<nsString> names;
>+  ExtractAllAddresses(aHeader, names, emails);
Seems a bit wasteful to extract all the names when you don't use them.

>+  nsAutoTArray<nsCString, 1> emails;
>+  ExtractAllAddresses(aHeader , names, emails);
>+
>+  if (emails.Length() > 0)
>+    email = emails[0];
I don't see the point of converting all the emails into UTF8. Just use an nsString array and convert the first element. (Ditto below.)

>+    CopyUTF8toUTF16(email, name);
email was previously copied from UTF16 to UTF8...
Some of this was discussed over IRC, but I'll leave comments here for posterity.

(In reply to neil@parkwaycc.co.uk from comment #83)
> >+struct ParsedHeader
> It would be nice if you could use an nsCOMArray<msgIAddressObject>, it
> probably only needs a move constructor. Or you could just use an outparam...

I can't use an nsCOMArray here, even if it had a move constructor: there's no way to adopt the internal msgIAddressObject** array, and I'd rather not copy. It really needs bug 450881 for this use case.

> >+    retval.mAddresses[i] = new MsgAddressObject(utf16Name,
> >+      NS_ConvertUTF8toUTF16(emailPtr));
> The charset is applied to the name but not the address. Is the address
> really UTF8 or is it ASCII?

The email address, according to EAI, ought to be UTF-8. In practice, it's probably ASCII (all use of IDN probably uses ACE for the moment right now). This particular implementation is temporary until I can get the JSMime version checked in, where charset concerns are much more consolidated and handled properly.

> >+  if (count == 1 && names[0].IsEmpty() && emails[0].IsEmpty())
> ...
> >+  if (count == 1 && names[0].IsEmpty())
> Can the email be empty without the name being empty, or vice versa? Is it
> correct to behave differently in those cases?

The latter case, "name" really refers to the displayName, which is (name ? name + " <" + email + ">" : email) in simpler terms. I decided to rename the variable here to "displayAddrs" to communicate it better.

> >+  nsTArray<nsString> names;
> >+  ExtractAllAddresses(aHeader, names, emails);
> Seems a bit wasteful to extract all the names when you don't use them.

The strings aren't going to be copied (at least if they are shareable).
Attachment #808935 - Attachment is obsolete: true
Attachment #808935 - Flags: review?(neil)
Attachment #8337293 - Flags: review?(neil)
Comment on attachment 8337293 [details] [diff] [review]
Part 2: Implement a C++ API for nsIMsgHeaderParser

>+  ~UTF16ArrayAdapter() { detail::DoConversion(mUTF16Array, mUTF8Array); }
Could make the destructor out-of-line and do the conversion there.

>+  aUTF8Array.Clear();
>+  aUTF8Array.SetCapacity(count);
Could use SetLength(count)...

>+  for (uint32_t i = 0; i < count; ++i)
>+    aUTF8Array.AppendElement(NS_ConvertUTF16toUTF8(aUTF16Array.ElementAt(i)));
... and CopyUTF16toUTF8(aUTF16Array[i], aUTF8Array[i]);

>+    email = NS_ConvertUTF16toUTF8(emails[0]);
Nit: CopyUTF16toUTF8(emails[0], email);

>+  nsAutoTArray<nsCString, 1> names, emails;
nsString...

>+    name = names[0];
>+    email = emails[0];
...CopyUTF16toUTF8

>+  nsAutoTArray<nsString, 1> names;
>+  nsAutoTArray<nsString, 1> emails;
Inconsistent with names, emails above.

>+    email = NS_ConvertUTF16toUTF8(emails[0]);
CopyUTF16toUTF8 again (NS_ConvertXXX is really designed as a temporary for use as a function parameter)
Attachment #8337293 - Flags: review?(neil) → review+
(In reply to neil@parkwaycc.co.uk from comment #87)
> Comment on attachment 8337293 [details] [diff] [review]
> Part 2: Implement a C++ API for nsIMsgHeaderParser
> 
> >+  ~UTF16ArrayAdapter() { detail::DoConversion(mUTF16Array, mUTF8Array); }
> Could make the destructor out-of-line and do the conversion there.

Not if UTF16ArrayAdapter is a template.

> >+    email = NS_ConvertUTF16toUTF8(emails[0]);
> CopyUTF16toUTF8 again (NS_ConvertXXX is really designed as a temporary for
> use as a function parameter)

Maybe with clever use of rvalue references, the need to have three variants of functions for converting UTF16 to UTF8 would go away.
(In reply to Joshua Cranmer from comment #88)
> UTF16ArrayAdapter is a template.
Ah yes, I'd forgotten already.

> Maybe with clever use of rvalue references, the need to have three variants
> of functions for converting UTF16 to UTF8 would go away.
Nothing to do with rvalue references, actually.
Attachment #808938 - Attachment is obsolete: true
Attachment #808938 - Flags: review?(neil)
Attachment #8340856 - Flags: review?(neil)
Attachment #808939 - Attachment is obsolete: true
Attachment #808939 - Flags: review?(neil)
Attachment #8340857 - Flags: review?(neil)
Attachment #8340856 - Flags: review?(neil) → review+
Comment on attachment 8340857 [details] [diff] [review]
Part 3g: Use the C++ API elsewhere

>+  nsresult getMailboxList(nsIMsgCompFields *compFields,
>+    nsTArray<nsCString> &mailboxes);
[I'm conflicted. Half of the users actually want UTF16.]

>diff --git a/mailnews/mapi/mapihook/src/msgMapiImp.cpp b/mailnews/mapi/mapihook/src/msgMapiImp.cpp
[I bet this has been abusing UTF8 when it really wants ANSI or UTF16.]
Attachment #8340857 - Flags: review?(neil) → review+
(In reply to neil@parkwaycc.co.uk from comment #92)
> Comment on attachment 8340857 [details] [diff] [review]
> Part 3g: Use the C++ API elsewhere
> 
> >+  nsresult getMailboxList(nsIMsgCompFields *compFields,
> >+    nsTArray<nsCString> &mailboxes);
> [I'm conflicted. Half of the users actually want UTF16.]

One thing this bug has made painfully aware to me is that having both UTF-16 and UTF-8 strings in our codebase is a really bad idea. Compose code is generally the worst, since it always seems to have its strings in the wrong encoding.

> >diff --git a/mailnews/mapi/mapihook/src/msgMapiImp.cpp b/mailnews/mapi/mapihook/src/msgMapiImp.cpp
> [I bet this has been abusing UTF8 when it really wants ANSI or UTF16.]

This is used only for the MAPIReadMail function which, coincidentally enough, doesn't have a W equivalent (like MAPISendMail/MAPISendMailW). [Intriguingly enough, all the Simple MAPI functions save MAPISendMailW/MAPISendMailHelper are deprecated.]

I'm planning on landing parts 1-3 as soon as the next release branches.
Depends on: 950333
Hopefully, this is the last version of this part, and I can land it before the next uplift.
Attachment #808940 - Attachment is obsolete: true
Attachment #808940 - Flags: superreview?(neil)
Attachment #8348502 - Flags: superreview?(neil)
Attachment #8348502 - Flags: review?(irving)
Comment on attachment 8348502 [details] [diff] [review]
Part 4: Introduce the new API to nsIMsgHeaderParser

Review of attachment 8348502 [details] [diff] [review]:
-----------------------------------------------------------------

Fix the patch header, of course. Other than that, looks good.

::: mailnews/mime/src/nsMsgHeaderParser.cpp
@@ +1536,5 @@
> +  NS_ENSURE_ARG_POINTER(retval);
> +
> +  // We are not going to be implementing group parsing yet.
> +  if (preserveGroups)
> +    return NS_ERROR_NOT_IMPLEMENTED;

Is this a regression compared to the old implementation?
Attachment #8348502 - Flags: review?(irving) → review+
Comment on attachment 8348502 [details] [diff] [review]
Part 4: Introduce the new API to nsIMsgHeaderParser

Review of attachment 8348502 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/public/nsIMsgHeaderParser.idl
@@ +48,5 @@
> +   * Due to the limitations of XPIDL, the type of this attribute cannot be
> +   * properly reflected. It is actually an array of msgIAddressObject instances,
> +   * although it is instead undefined if this object does not represent a group.
> +   */
> +  readonly attribute jsval members;

Ah, brief correction here worth meriting:

When doing further development on jsmime, I elected to rename the 'members' to 'group' as a slightly better reflection of intent. I apparently neglected to go back and change my patch queue to take this into account, though. I'll try to upload a new patch with this change soon.
An updated version taking into account the mass-PRUnichar change and the s/members/group/ thing I mentioned above. Carrying forward irving's r+
Attachment #8348502 - Attachment is obsolete: true
Attachment #8348502 - Flags: superreview?(neil)
Attachment #8359866 - Flags: superreview?(neil)
Attachment #8359866 - Flags: review+
(In reply to :Irving Reid from comment #96)
> ::: mailnews/mime/src/nsMsgHeaderParser.cpp
> @@ +1536,5 @@
> > +  NS_ENSURE_ARG_POINTER(retval);
> > +
> > +  // We are not going to be implementing group parsing yet.
> > +  if (preserveGroups)
> > +    return NS_ERROR_NOT_IMPLEMENTED;
> 
> Is this a regression compared to the old implementation?

No--this isn't changing anything in the old parsing implementation. Strictly speaking, the current C++ implementation can't handle the contract of groups correctly (it attaches the group name to the first address in the group, and empty groups are entries with no mailbox). However, trying to get any semblance of sanity with the group format in the C++ version is not worth it when the JSMime 0.2 version is getting close to landing.
Comment on attachment 8359866 [details] [diff] [review]
Part 4: Introduce the new API to nsIMsgHeaderParser

[Presumably this is the version prior to the nsCOMArray improvements]

>   void makeFromDisplayAddress(in AString aDisplayAddresses,
>                               out unsigned long count,
>                               [retval, array, size_is(count)] out msgIAddressObject addresses);
[How did this escape being an optional count?]
Attachment #8359866 - Flags: superreview?(neil) → superreview+
And the final patch has been pushed:
https://hg.mozilla.org/comm-central/rev/00d7b30ca499
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 29.0
Congrats to finally landing this.
Need to include nsMemory.h for this to build with the external api
Attachment #8369049 - Flags: review?(Pidgeot18)
Whiteboard: [external-api-bustage]
Attachment #8369049 - Flags: review?(Pidgeot18) → review+
I use the following code in my addon CompactHeader (https://addons.mozilla.org/thunderbird/addon/compactheader/):
    var msgHeaderParser = Components.classes["@mozilla.org/messenger/headerparser;1"]
                                    .getService(Components.interfaces.nsIMsgHeaderParser);
    numAddresses = msgHeaderParser.parseHeadersWithArray(emailAddresses, addresses, names, fullNames);

Does the patches of this bug mean that I have to change my code?
(In reply to Joachim Herb from comment #104)
> I use the following code in my addon CompactHeader
> (https://addons.mozilla.org/thunderbird/addon/compactheader/):
>     var msgHeaderParser =
> Components.classes["@mozilla.org/messenger/headerparser;1"]
>                                    
> .getService(Components.interfaces.nsIMsgHeaderParser);
>     numAddresses = msgHeaderParser.parseHeadersWithArray(emailAddresses,
> addresses, names, fullNames);
> 
> Does the patches of this bug mean that I have to change my code?

As described in the newsgroup posting, your code as written will still work, but the method you are currently using is obsolete and will be removed, likely in the TB 32-38 timeframe.
Please see Newsgroup m.d.a.t with
TB 29.0a2 (2014-02-05) -- no autocomplete for addressing with compose
news://news.mozilla.com:119/NOGdnU5RfNzW0W7PnZ2dnUVZ_rqdnZ2d@mozilla.org

Just to add: Yes I have an extension also using autocomplete. Also that extensions isn't involved with normal compose I disabled that XPI. And also with that every character entered to the address text box (TO, CC ..) will throw this two errors:
> Warning: ReferenceError: assignment to undeclared variable i
> Source file: jar:file:///media/DATA/_Mozilla/TB_gW/Profiles/ouh3uj2w.2012_01/extensions/%7B3e17310d-82e8-4a43-bd2f-7c3055bfe589%7D.xpi!/components/nsAbAutoCompleteSearch.js
> Line: 422
>  ----------
> Error: this._parser.makeFullAddress is not a function
> Source file: jar:file:///media/DATA/_Mozilla/TB_gW/Profiles/ouh3uj2w.2012_01/extensions/%7B3e17310d-82e8-4a43-bd2f-7c3055bfe589%7D.xpi!/components/nsAbAutoCompleteSearch.js
> Line: 428 


Worth to add:  this is with compose a message to normal mail accounts, only. With [Newsgroup.] it's offering the available group names.
I read that the documentation will be added in part 5, but part 5 seems to be lost. Will the new functions be documentated? dev-doc-needed?
Depends on: 999269
Depends on: 1054252
Depends on: 1085731
Depends on: 1289939
You need to log in before you can comment on or make changes to this bug.