Closed Bug 858337 Opened 11 years ago Closed 10 years ago

Implement header parsing in JSMime

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 32.0

People

(Reporter: jcranmer, Assigned: jcranmer)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: addon-compat)

Attachments

(2 files, 15 obsolete files)

78.32 KB, patch
Irving
: review+
Details | Diff | Splinter Review
5.41 KB, patch
Details | Diff | Splinter Review
Just as the summary says. By the time the patches on this bug and its dependencies land, nsIMimeConverter and nsIMsgHeaderParser will be implemented by jsmime.

This is being split from bug 842632 to make it clearer which patches are necessary to get things to apply. The entire stack will look like:

* Patches in bug 842632 (Parts 1, 2, 3a-g, and 4)
* Patches in bug 790855 (Parts 1 through 4 [if I don't fold them before posting])
* Patches in this bug (Parts 1 through... 8 maybe?)
* The final nsIMimeConverter implementation patches, probably going in bug 790855.
Attached patch Part 1: Add address parsing (obsolete) — Splinter Review
Attachment #735784 - Flags: review?(irving)
Attached patch Part 2: Add address emitting (obsolete) — Splinter Review
Attachment #735786 - Flags: review?(irving)
You may start noticing a pattern of tweaking tests as I move around implementations. Since I'm a nice guy who tries to make sure that each individual patch compiles and passes all tests on its own, I can't fix a (really) broken test except in the exact patch where I move the broken implementation to the good implementation.
Attachment #735790 - Flags: review?(neil)
I hate nsIMsgCompFields::SplitRecipients with a passion at this point.

Well, at least here's 1500 lines of code down the drain!
Attachment #735794 - Flags: review?(neil)
Attachment #735796 - Flags: review?(mbanner)
I didn't even know until now that Lightning used these APIs...
Attachment #735797 - Flags: review?(philipp)
Which is to say, the address book and a touch of gloda.
Attachment #735799 - Flags: review?(mbanner)
Attached patch Part 5d: Use the new API in mail (obsolete) — Splinter Review
... that mozmill test ...
Attachment #735802 - Flags: review?(mconley)
I'll admit I didn't test this, but it's much more straightforward than some of the corresponding changes in mail, I think.
Attachment #735804 - Flags: review?(mnyromyr)
The final patch (well, the greek patches over in the other bug are after this in my queue).
Attachment #735806 - Flags: review?(neil)
And, for the record, the diffstat for all three of these bugs comes out to:
 101 files changed, 3080 insertions(+), 4165 deletions(-)
Attachment #735797 - Flags: review?(philipp) → review+
Comment on attachment 735790 [details] [diff] [review]
Part 3: Implement nsIMsgHeaderParser in JS

Review of attachment 735790 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/src/mimeJSComponents.js
@@ +45,5 @@
> +};
> +
> +var EmailGroup = {
> +  toString: function () {
> +    return this.name + ": " + [x.toString for (x of this.members)].join(", ");

Not toString()?
Comment on attachment 735796 [details] [diff] [review]
Part 5a: Use the new API in MIME tests

Review of attachment 735796 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/test/unit/test_nsIMsgHeaderParser2.js
@@ -55,5 @@
>    ];
>  
> -  // this used to cause memory read overruns
> -  let addresses = {}, names = {}, fullAddresses = {};
> -  MailServices.headerParser.parseHeadersWithArray("\" \"@a a;b", addresses, names, fullAddresses);

Why remove this test? Can't we reform it in some way?

::: mailnews/mime/test/unit/test_parseHeadersWithArray.js
@@ -6,5 @@
> -Components.utils.import("resource:///modules/mailServices.js");
> -
> -function run_test() {
> -  let addresses = {}, names = {}, fullAddresses = {};
> -  let n = MailServices.headerParser.parseHeadersWithArray("example@host.invalid",

AFAICT, parseHeadersWithArray is only deprecated in the new world post bug 842632, therefore we should keep this test until we actually remove it.
Attachment #735796 - Flags: review?(mbanner) → review-
Comment on attachment 735799 [details] [diff] [review]
Part 5c: Use the new API in mailnews

Review of attachment 735799 [details] [diff] [review]:
-----------------------------------------------------------------

Generally looks fine, but I'll need to review/test fully once the rest lands.
Attachment #735799 - Flags: review?(mbanner) → feedback+
Attachment #735790 - Attachment is obsolete: true
Attachment #735790 - Flags: review?(neil)
Attachment #741054 - Flags: review?(neil)
Bitrot, bitrot, bitrot!
Attachment #735794 - Attachment is obsolete: true
Attachment #735794 - Flags: review?(neil)
Attachment #741057 - Flags: review?(neil)
Most of Part 5* have changes, but I'll hold off on reposting those until I get more of bug 842632 reviewed.
Attachment #735802 - Flags: review?(mconley) → feedback?(mconley)
Attachment #735804 - Flags: review?(mnyromyr) → feedback?(mnyromyr)
Comment on attachment 741057 [details] [diff] [review]
Part 4: Use the JS implementation in MimeHeaderParser.cpp

Autocompletion failure...
Attachment #741057 - Attachment description: Part 4: Move clang static-initializers detection over → Part 4: Use the JS implementation in MimeHeaderParser.cpp
Comment on attachment 735804 [details] [diff] [review]
Part 5e: Use the new API in suite

(I didn't actually applied the whole shebang of patches needed to test this, the comments below are just based upon mere reading.)

Generally, overzealous line wrapping (OLW) is not needed, it only makes the code harder to read.

>+              recipient = gMimeHeaderParser.makeMimeHeader(
>+                [gMimeHeaderParser.makeMailboxObject(fullValue.name,
>+                  fullValue.email)], 1);

OLW. Better (for readability) store the makeMailboxObject in a variable in one line and create the recipient from that in another line.

>+  let addresses = [x.toString() for (x of
>+    hdrParser.parseEncodedHeader(strippedAddresses, null, false))];

OLW.
And isn't that equivalent to "let addresses = hdrParser.parseEncodedHeader(stuff);"? [1]

>+  if (addresses[0])

addresses may be empty, afaict, so this is not safe then.

>+  var emailAddress = headerParser.parseDecodedHeader(
>+    msgHdr.mime2DecodedAuthor, false)[0].email;

OLW.

>+    var emailAddress = headerParser.parseDecodedHeader(
>+      aMsgHdr.mime2DecodedAuthor, false)[0].email;

OLW.

>+  var author = headerParser.parseDecodedHeader(
>+    msgHdr.mime2DecodedAuhor, false)[0];

OLW. And I think you mean "mime2DecodedAuthor"?

>-    var args = {primaryEmail:authorEmailAddress, displayName:names.value[0],
>+    var args = {primaryEmail:author.email, displayName:author.name,
>                 allowRemoteContent:true};

ULW ;-)
Just throw the second param on its own line as well.

>-          msgHeaderParser.extractHeaderAddressMailboxes(
>-            currentHeaderData.sender.headerValue) + kMailboxSeparator;
>+          msgHeaderParser.parseEncodedHeader(
>+            currentHeaderData.sender.headerValue, null, false)[0].email +

OLW.

>-          msgHeaderParser.extractHeaderAddressMailboxes(
>-            currentHeaderData.from.headerValue) + kMailboxSeparator;
>+          msgHeaderParser.parseEncodedHeader(
>+            currentHeaderData.from.headerValue, null, false)[0].email +

OLW.

f=me with these, assumed [1] is sufficiently explained.
Attachment #735804 - Flags: feedback?(mnyromyr) → feedback+
Attachment #735784 - Flags: review?(irving) → review+
Comment on attachment 735786 [details] [diff] [review]
Part 2: Add address emitting

Review of attachment 735786 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/jsmime/mimeAssemblerHeaders.js
@@ +274,5 @@
> +  let needsComma = false;
> +  for (let addr of addresses) {
> +    // Ignore a dummy empty address.
> +    if (addr.email !== undefined && addr.email === "")
> +      continue;

This looks a bit odd; are you trying to say

if ("email" in addr && addr.email === "")

or do you actually set addr.email = undefined for a special case?
Attachment #735786 - Flags: review?(irving) → review+
Comment on attachment 735802 [details] [diff] [review]
Part 5d: Use the new API in mail

Review of attachment 735802 [details] [diff] [review]:
-----------------------------------------------------------------

Beyond my usual "replace vars with lets" complaint, this looks reasonable. Thanks Joshua.

::: mail/base/content/mailWindowOverlay.js
@@ +2950,5 @@
>    var msgHdr = gMessageDisplay.displayedMessage;
>    if (!msgHdr)
>      return;
>  
> +  var author = MailServices.headerParser.parseDecodedHeader(

let, not var

::: mail/base/content/msgHdrViewOverlay.js
@@ +552,5 @@
>        // mailing lists, but not that useful).
>        if (("from" in currentHeaderData) &&
>            ("to" in currentHeaderData) &&
>            ("reply-to" in currentHeaderData)) {
> +        var replyToMailbox = MailServices.headerParser.parseEncodedHeader(

Replace these var's with let's.

@@ +1170,5 @@
>  {
>    if (!emailAddresses)
>      return;
>  
> +  var addresses = MailServices.headerParser

let, not var
Attachment #735802 - Flags: feedback?(mconley) → feedback+
Attachment #735784 - Attachment is obsolete: true
Attachment #735786 - Attachment is obsolete: true
Attachment #741054 - Attachment is obsolete: true
Attachment #741054 - Flags: review?(neil)
Attachment #741057 - Attachment is obsolete: true
Attachment #741057 - Flags: review?(neil)
Attachment #735796 - Attachment is obsolete: true
Attachment #735797 - Attachment is obsolete: true
Attachment #735799 - Attachment is obsolete: true
Attachment #735802 - Attachment is obsolete: true
Attachment #735804 - Attachment is obsolete: true
Attachment #735806 - Attachment is obsolete: true
Attachment #735806 - Flags: review?(neil)
[Apologies for bugspam, there's no easy way to mass-obsolete patches without uploading a new one]

After some consideration, I've changed the landing strategy for the next steps of JSMime, which makes the existing patches in the queue outside of bug 842632 obsolete.
Depends on: 959309
Attached patch headerparser-jsimpl (obsolete) — Splinter Review
This does require a lot of small changes to tests due to differences in how we handle error cases.
Attachment #8402244 - Flags: superreview?(standard8)
Attachment #8402244 - Flags: review?(irving)
Blocks: 790855
No longer depends on: 790855
Comment on attachment 8402244 [details] [diff] [review]
headerparser-jsimpl

Review of attachment 8402244 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/test/unit/test_nsIMsgHeaderParser2.js
@@ +48,5 @@
>       // extractHeaderAddressName returns unquoted names, hence the difference.
>       "Joe Q. Public" ],
>      // Bug 549931
>      ["Undisclosed recipients:;",
> +     "", // Mailboxes

Have you verified that this change doesn't adversely affect the display of messages received from Undisclosed recipients?

@@ -61,5 @@
> -  // This checks that the mime header parser doesn't march past the end
> -  // of strings with ":;" in them. The second ":;" is required to force the
> -  // parser to keep going.
> -  do_check_eq(MailServices.headerParser.extractHeaderAddressMailboxes(
> -    "undisclosed-recipients:;\0:; foo <ghj@veryveryveryverylongveryveryveryveryinvalidaddress.invalid>"),

iirc, it was a security issue that introduced this check. Are we sure there are no possible equivalents to this?
Comment on attachment 8402244 [details] [diff] [review]
headerparser-jsimpl

Review of attachment 8402244 [details] [diff] [review]:
-----------------------------------------------------------------

Code looks good, I'd like to see a few more tests before r+

::: mailnews/mime/src/mimeJSComponents.js
@@ +127,5 @@
> +    let address = this.parseDecodedHeader(aHeader, false)[0];
> +    return address.name || address.email;
> +  },
> +
> +  removeDuplicateAddresses: function (aAddrs, aOtherAddrs) {

I'd like to see cases in the removeDuplicateAddresses test suite to cover mail groups (and IDN, if you can make some todo tests).

@@ +183,5 @@
> +  },
> +
> +  makeFromDisplayAddress: function (aDisplay, count) {
> +    // The basic idea is to split on every comma, so long as there is a
> +    // preceding @.

Are there boundary conditions in here that could bite us? Do we need test cases?
Attachment #8402244 - Flags: review?(irving) → feedback+
(In reply to Mark Banner (:standard8) from comment #24)
> @@ -61,5 @@
> > -  // This checks that the mime header parser doesn't march past the end
> > -  // of strings with ":;" in them. The second ":;" is required to force the
> > -  // parser to keep going.
> > -  do_check_eq(MailServices.headerParser.extractHeaderAddressMailboxes(
> > -    "undisclosed-recipients:;\0:; foo <ghj@veryveryveryverylongveryveryveryveryinvalidaddress.invalid>"),
> 
> iirc, it was a security issue that introduced this check. Are we sure there
> are no possible equivalents to this?

The header itself is checked in a JSMime test. In general, we're very resilient to null characters in pure JS unlike in C.

(In reply to :Irving Reid from comment #25)
> I'd like to see cases in the removeDuplicateAddresses test suite to cover
> mail groups (and IDN, if you can make some todo tests).

My current plan for IDN involves making a new section of jsmime called emailutils (for lack of a better name) which will support a function that checks for equality of email addresses in the presence of things like localpart quoting or IDN. Most of the interesting IDN tests would end up there, so I don't think I need IDN todo tests in nsIMsgHeaderPaser3.js. Mail groups I can do, however.

> Are there boundary conditions in here that could bite us? Do we need test
> cases?

test_nsIMsgHeaderParser4.js tests these already.
(In reply to Mark Banner (:standard8) from comment #24)
> Have you verified that this change doesn't adversely affect the display of
> messages received from Undisclosed recipients?

Depends on your definition of "adversely affect" :-) The patch by itself would display it with an empty line, like so:
+---------------------
|To
|Subject Test email
+---------------------

Otherwise, it appears to work fine. I did try out a small translator specifically for nsMsgHdrViewOverlay.js that adds the elements back in, using nsIMsgHeaderParser.parseEncodedHeader (oh how I would love structured headers right now). I figure it's better to tackle handling of groups on an individual callee basis rather than trying to make the API support the magical-and-vaguely-wrong handling of the original method.
I'd like to hope for a quick turnaround on this patch: this blocks landing the mime converter changes (the context diffs are not trivially modifiable), which REALLY needs to go into TB 31.
Attachment #8402244 - Attachment is obsolete: true
Attachment #8402244 - Flags: superreview?(standard8)
Attachment #8408777 - Flags: superreview?(standard8)
Attachment #8408777 - Flags: review?(irving)
This restores the handling of display groups to what it would have been like in the C++ code era by doing it manually in the header parsing code, which is arguably a far better place for it then manually stuffing it onto elements or creating fake stuff in parseMailboxes stuff.
Attachment #8408778 - Flags: review?(standard8)
Comment on attachment 8408777 [details] [diff] [review]
headerparser-jsimpl (landed in comment 42)

Review of attachment 8408777 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/src/mimeJSComponents.js
@@ +151,5 @@
> +          return false;
> +        allAddresses.add(key);
> +      } else {
> +        // Groups -> filter out all the member addresses.
> +        e.group = e.group.filter(filterAccept);

good thing i asked for group filtering tests ;->

::: mailnews/mime/test/unit/test_nsIMsgHeaderParser3.js
@@ +53,5 @@
> +      otherAddrs: "foo bar <ghj@foo.invalid>",
> +      expectedResult: "A group: foo bar <foo@bar.invalid>;" },
> +    { addrs: "A group: foo bar <foo@bar.invalid>;, foo <ghj@foo.invalid>",
> +      otherAddrs: "foo <foo@bar.invalid>",
> +      expectedResult: "A group: ; , foo <ghj@foo.invalid>" },

Does this behaviour match what happened before - duplicate addresses are removed from groups, rather than from other places they occur in the address lists? If so, r+. I'm not finding an easy way to create a message with list syntax in To: using TB, or I'd check myself...
Attachment #8408777 - Flags: review?(irving) → review+
(In reply to :Irving Reid from comment #30)
> Does this behaviour match what happened before - duplicate addresses are
> removed from groups, rather than from other places they occur in the address
> lists? If so, r+. I'm not finding an easy way to create a message with list
> syntax in To: using TB, or I'd check myself...

The C++ header parser never really admitted the existence of groups in the first place; the closest it comes is prefixing the name of the group onto the first element in the group and then pretending that the entire array was flat (in contrast, this preserves the group structure).
Will this be in TB31? I would like to have it in TB31 so I can use the new API for my addon.
Thanks.
Sorry for the spam, But I didn't intend to change the flag.
I dearly hope so, but that is entirely dependent on someone providing me with a review.
Comment on attachment 8408778 [details] [diff] [review]
Handle display of groups more sanely

Review of attachment 8408778 [details] [diff] [review]:
-----------------------------------------------------------------

Whilst this fixes display in messages, it doesn't fix it in the message list display (receipent column). I'm also wondering if gloda is going to handle the results properly as well.
Attachment #8408778 - Flags: review?(standard8) → review-
Comment on attachment 8408777 [details] [diff] [review]
headerparser-jsimpl (landed in comment 42)

Review of attachment 8408777 [details] [diff] [review]:
-----------------------------------------------------------------

So I'm generally happy with this, apart from the group handling already commented. I think we need to ensure that at least the message list and gloda are able to handle the new format properly before we can land this.
Attachment #8408777 - Flags: superreview?(standard8)
(In reply to Mark Banner (:standard8) from comment #35)
> Comment on attachment 8408778 [details] [diff] [review]
> Handle display of groups more sanely
> 
> Review of attachment 8408778 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Whilst this fixes display in messages, it doesn't fix it in the message list
> display (receipent column).

For recipient column, pls also note
Bug 522886 - Implement "Recipients" column that shows all recipients (To, CC, and BCC)
which has a recent patch which is quite close
(In reply to Mark Banner (:standard8) from comment #35)
> Whilst this fixes display in messages, it doesn't fix it in the message list
> display (receipent column). I'm also wondering if gloda is going to handle
> the results properly as well.

I'm inclined to disagree. The recipient column is only usually handled in the Sent mail or Drafts column, which we aren't usually going to pick up groups in the first place. In other cases, we'll just no longer show the group name, which we arguably don't even preserve very well. The only case where group names usually show up in practice is undisclosed-recipients, and considering that no one complains about empty column values in the Drafts case already, I don't forsee it being a user complaint.

I'm unaware of where gloda actually displays the To: value offhand, and I know I've seen places where some of our code (I think it was gloda, but all JS header parsing code blends together in my fuzzy memory) tries to go through and strip off all the group names.

An alternative would be landing these patches as-is and fixing them in follow-ups. I really need these patches to land very soon, because it is blocking a very large amount of work.
Per an IRC conversation, I'll land the relevant patches when the tree reopens and I've posted a patch for the gloda view issue.
Attached patch Fix indexing in gloda (obsolete) — Splinter Review
It's impressive how much parseEncodedHeader(blah, true) works. The failures of dynamically-typed languages. :-)

Testing this is slightly annoying as it affects messages during the indexing phase. Lots and lots and lots of rm global-messages-db.sqlite went into this patch.
Attachment #8434184 - Flags: review?(standard8)
Sorry, I decided to merge the last two patches into a single patch.
Attachment #8408778 - Attachment is obsolete: true
Attachment #8434184 - Attachment is obsolete: true
Attachment #8434184 - Flags: review?(standard8)
Attachment #8434188 - Flags: review?(standard8)
The main implementation itself is landed:
https://hg.mozilla.org/comm-central/rev/a8d1e10d39b7

Fixing up the header view and the gloda will land when they get an r+.
(In reply to Joshua Cranmer [:jcranmer] (away June 7-12) from comment #4)
> Created attachment 735794 [details] [diff] [review]
> Part 4: Use the JS implementation in MimeHeaderParser.cpp
> 
> I hate nsIMsgCompFields::SplitRecipients with a passion at this point.

Joshua, do your patches here happen to fix bug 919953?
Looks like bug 242693 for allowing ";" as alternative separator took it a bit too far and we lost RFC822 groups support (named lists with semicolon, e.g. testlist: name1@foo.com, name2@bar.com; )
Blocks: 919953
Depends on: 1023285
Comment on attachment 8434188 [details] [diff] [review]
Fix groups in the display

Review of attachment 8434188 [details] [diff] [review]:
-----------------------------------------------------------------

Shouldn't parseDecodedHeader be used here; the mime emitter sends a nsIUTF8StringEnumerator of AUTF8String. The above patch regressed Bug 1023285 and other internationalized headers cases.

It's sort of unexpected that there aren't tests to catch this.
A further regression from the change to parseHeadersWithArray() is that addresses with no addr-spec are being returned with emtpy angle brackets, ie "John Doe" in the header becomes "John Doe <>".  Is this intentional (meaning the caller now has to clean it up it the caller's context), or an oversight?
Comment on attachment 8434188 [details] [diff] [review]
Fix groups in the display

Review of attachment 8434188 [details] [diff] [review]:
-----------------------------------------------------------------

I've just tested this, and I couldn't see the undisclosed-recipients in the sent folder, nor in the gloda open as list views.
Attachment #8434188 - Flags: review?(standard8)
(In reply to Mark Banner (:standard8) from comment #46)
> I've just tested this, and I couldn't see the undisclosed-recipients in the
> sent folder, nor in the gloda open as list views.

Getting it to display with gloda requires rebuilding the index. It works fine for me in my local tests.
(In reply to Joshua Cranmer [:jcranmer] from comment #47)
> (In reply to Mark Banner (:standard8) from comment #46)
> > I've just tested this, and I couldn't see the undisclosed-recipients in the
> > sent folder, nor in the gloda open as list views.
> 
> Getting it to display with gloda requires rebuilding the index. It works
> fine for me in my local tests.

so sahll we bump de gloda schema to force a reindex ?
Depends on: 1055077
(In reply to alta88 from comment #45)
> A further regression from the change to parseHeadersWithArray() is that
> addresses with no addr-spec are being returned with emtpy angle brackets, ie
> "John Doe" in the header becomes "John Doe <>".  Is this intentional
> (meaning the caller now has to clean it up it the caller's context), or an
> oversight?
Flags: needinfo?(Pidgeot18)
I have probably hit something similar in bug 391717.
(In reply to alta88 from comment #45)
> A further regression from the change to parseHeadersWithArray() is that
> addresses with no addr-spec are being returned with emtpy angle brackets, ie
> "John Doe" in the header becomes "John Doe <>".  Is this intentional
> (meaning the caller now has to clean it up it the caller's context), or an
> oversight?

Well, in theory we shouldn't have addresses without an addr-spec. Since it doesn't look like this theory holds in practice, we probably ought to make such cases default to an addrspec with an empty name or something to generate so we can at least roundtrip such values.
Flags: needinfo?(Pidgeot18)
Blocks: 1066409
Moving the followups for the group logic to bug 1066409.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 32.0
I've heard reports that addresses like "Neil @ Parkway" <neil@parkwaycc.co.uk> are getting mangled in SeaMonkey 2.29 when they didn't before, could that be related to this change (or maybe there's already a bug on this and I just haven't found it yet?)
No longer blocks: 853437
No longer blocks: 286760
Depends on: 1082896
Attachment #8408777 - Attachment description: headerparser-jsimpl → headerparser-jsimpl (landed in comment 42)
Depends on: 1146099
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: