Closed Bug 998189 Opened 10 years ago Closed 10 years ago

Add a basic structured header interface

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(thunderbird36 fixed)

RESOLVED FIXED
Thunderbird 36.0
Tracking Status
thunderbird36 --- fixed

People

(Reporter: jcranmer, Assigned: jcranmer)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 2 obsolete files)

Attached file structured-headers (obsolete) —
Not asking for review yet (or even feedback), but I need a bug that pulls out the core structured header concept.
Blocks: 998191
To preempt Neil's probable question: yes, this works on Windows, according to Try.

[For context, the vtable ordering of MSVC does not work quite the same if you have methods overload, which screws over the logic we use in XPTs for xpconnect. However, the overloaded name here has a nonvirtual and a virtual method involved, which doesn't trigger the screwiness.]
Attachment #8408754 - Attachment is obsolete: true
Attachment #8434583 - Flags: superreview?(neil)
Attachment #8434583 - Flags: review?(irving)
Comment on attachment 8434583 [details] [diff] [review]
Add the readable and writable interfaces

>+  jsval getHeader(in string aHeaderName);
So, this could be absolutely anything?

>+  /**
>+   * Retrieve the value of the header as if it is an unstructured header. Such
>+   * headers include most notably the Subject header. If the header is not
>+   * present, than undefined is returned. This is reflected in C++ as an empty
>+   * string with IsVoid() set to true (distinguishing it from an empty string).
>+   *
>+   * @param aHeaderName     The name of the header to retrieve.
>+   */
>+  AString getUnstructuredHeader(in string aHeaderName);
Actually it's an AString, so it gets double-converted into null* either way.
*Or a void string in C++ if you prefer.

>+  nsresult SetAddressingHeader(const char *aPropertyName,
>+                               const nsCOMArray<msgIAddressObject> &addrs)
>+  {
>+    return SetAddressingHeader(aPropertyName,
>+      const_cast<nsCOMArray<msgIAddressObject>&>(addrs).Elements(),
>+      addrs.Length());
>+  }
[This doesn't really save much over passing foo.Elements(), foo.Length() yourself...]

>+function loadEncoderFromUrl(url) {
>+  // Use a safe proxy to make sure that different scripts are isolated from each
>+  // other.
>+  let safeObject = {};
>+  let handler = {
>+    get: function (target, name) {
>+      if (name in target)
>+        return target[name];
>+      if (name in global)
>+        return global[name];
>+    }
>+  };
>+  let safeProxy = new Proxy(safeObject, handler);
>+  Services.scriptloader.loadSubScript(url, safeProxy, "UTF-8");
Not sure what this is achieving here above the usual subscript loader scope mechanism.

>+    return MimeAddressParser.prototype._fixArray(addrs, aPreserveGroups, count);
Eww. Can this function be moved to somewhere that makes the call saner?
(In reply to neil@parkwaycc.co.uk from comment #2)
> Comment on attachment 8434583 [details] [diff] [review]
> Add the readable and writable interfaces
> 
> >+  jsval getHeader(in string aHeaderName);
> So, this could be absolutely anything?

Yes. Valid (or planned to be valid) values could include:
* Arrays of msgIAddressObject (for To and friends)
* JS Date objects (for Date and friends)
* JS Map objects with extra properties (Content-Type, Content-Disposition)
* Strings (Subject et al)
* Arrays of strings (Newsgroups, References and co)

> >+  /**
> >+   * Retrieve the value of the header as if it is an unstructured header. Such
> >+   * headers include most notably the Subject header. If the header is not
> >+   * present, than undefined is returned. This is reflected in C++ as an empty
> >+   * string with IsVoid() set to true (distinguishing it from an empty string).
> >+   *
> >+   * @param aHeaderName     The name of the header to retrieve.
> >+   */
> >+  AString getUnstructuredHeader(in string aHeaderName);
> Actually it's an AString, so it gets double-converted into null* either way.
> *Or a void string in C++ if you prefer.

[spots spelling error and quickly corrects]

Hmm, you're right, xpconnect does convert that into null. It's still IsVoid() in C++.

> >+  nsresult SetAddressingHeader(const char *aPropertyName,
> >+                               const nsCOMArray<msgIAddressObject> &addrs)
> >+  {
> >+    return SetAddressingHeader(aPropertyName,
> >+      const_cast<nsCOMArray<msgIAddressObject>&>(addrs).Elements(),
> >+      addrs.Length());
> >+  }
> [This doesn't really save much over passing foo.Elements(), foo.Length()
> yourself...]

It saves a const_cast. nsCOMArray::Elements() is not a const-method. There is one or two cases later one where I do want to use a const nsCOMArray<msgIAddressObject>&.
> 
> >+function loadEncoderFromUrl(url) {
> >+  // Use a safe proxy to make sure that different scripts are isolated from each
> >+  // other.
> >+  let safeObject = {};
> >+  let handler = {
> >+    get: function (target, name) {
> >+      if (name in target)
> >+        return target[name];
> >+      if (name in global)
> >+        return global[name];
> >+    }
> >+  };
> >+  let safeProxy = new Proxy(safeObject, handler);
> >+  Services.scriptloader.loadSubScript(url, safeProxy, "UTF-8");
> Not sure what this is achieving here above the usual subscript loader scope
> mechanism.

Script isolation. If you load script handler-a.js and then load handler-b.js, handler-b.js can't affect anything in handler-a.js. The use of the proxy lets me safely export jsmime to the subscripts (so they can call jsmime.headerparser.addStructuredDecoder() and similar functions).

> >+    return MimeAddressParser.prototype._fixArray(addrs, aPreserveGroups, count);
> Eww. Can this function be moved to somewhere that makes the call saner?

Yes.
(In reply to Joshua Cranmer from comment #3)
> (In reply to comment #2)
> > (From update of attachment 8434583 [details] [diff] [review])
> > >+  nsresult SetAddressingHeader(const char *aPropertyName,
> > >+                               const nsCOMArray<msgIAddressObject> &addrs)
> > >+  {
> > >+    return SetAddressingHeader(aPropertyName,
> > >+      const_cast<nsCOMArray<msgIAddressObject>&>(addrs).Elements(),
> > >+      addrs.Length());
> > >+  }
> > [This doesn't really save much over passing foo.Elements(), foo.Length()
> > yourself...]
> 
> It saves a const_cast. nsCOMArray::Elements() is not a const-method. There
> is one or two cases later one where I do want to use a const
> nsCOMArray<msgIAddressObject>&.

Sigh, I guess this is because in XPIDL an in array becomes an nsIFoo** instead of an nsIFoo*const*. (I wonder how much would break if that was changed.)

> > >+function loadEncoderFromUrl(url) {
> > >+  // Use a safe proxy to make sure that different scripts are isolated from each
> > >+  // other.
> > >+  let safeObject = {};
> > >+  let handler = {
> > >+    get: function (target, name) {
> > >+      if (name in target)
> > >+        return target[name];
> > >+      if (name in global)
> > >+        return global[name];
> > >+    }
> > >+  };
> > >+  let safeProxy = new Proxy(safeObject, handler);
> > >+  Services.scriptloader.loadSubScript(url, safeProxy, "UTF-8");
> > Not sure what this is achieving here above the usual subscript loader scope
> > mechanism.
> 
> Script isolation. If you load script handler-a.js and then load
> handler-b.js, handler-b.js can't affect anything in handler-a.js. The use of
> the proxy lets me safely export jsmime to the subscripts (so they can call
> jsmime.headerparser.addStructuredDecoder() and similar functions).

The proxy doesn't buy you anything here because the subscript loader sets the scope chain to be the passed-in object and the caller's global scope anyway.
Comment on attachment 8434583 [details] [diff] [review]
Add the readable and writable interfaces

Review of attachment 8434583 [details] [diff] [review]:
-----------------------------------------------------------------

Really only a couple of questions to resolve.

::: mailnews/mime/public/msgIStructuredHeaders.idl
@@ +55,5 @@
> +  /**
> +   * Retrieve the value of the header as if it is an unstructured header. Such
> +   * headers include most notably the Subject header. If the header is not
> +   * present, than undefined is returned. This is reflected in C++ as an empty
> +   * string with IsVoid() set to true (distinguishing it from an empty string).

s/than/then (though really, I'd just leave out the "then")

s/(.+)/(distinguishing it from a header that is present but contains an empty string)

@@ +68,5 @@
> +   * returned.
> +   *
> +   * @param aHeaderName     The name of the header to retrieve.
> +   * @param aPreserveGroups If false (the default), then all groups in the
> +   *                        header will be collapsed into individual elements.

"collapsed" doesn't feel like the right word here. Something like "... the returned array contains all the addresses in the header as a flat list, with group annotations removed. If true, the returned array represents the actual structure of the header, and so may contain group objects that hold lists of addresses or nested groups"

@@ +91,5 @@
> +   * The header names returned may be in different cases depending on the
> +   * precise implementation of this interface, so implementations should not
> +   * rely on an exact kind of case being returned.
> +   */
> +  readonly attribute nsIUTF8StringEnumerator headerNames;

do we really want these to be utf-8 strings? Everything else in this XPCOM API is utf-16 (RFC *822 header names should always be ASCII, I think, but why limit ourselves to that?).

@@ +116,5 @@
> +};
> +
> +/**
> + * An interface that enhances msgIStructuredHeaders by allowing the values of
> + * headers to be modified.

What benefit does this confer, compared to a single header API that has a 'writable' attribute and throws if you use any of the write methods on a read-only instance?

@@ +127,5 @@
> +   *
> +   * @param aHeaderName     The name of the header to store.
> +   * @param aValue          The rich, structured value of the header to store.
> +   */
> +  void setHeader(in string aHeaderName, in jsval aValue);

What happens if I mismatch - pass a structured value to an unstructured header, or pass a string to an addressing header?

@@ +167,5 @@
> +
> +  /**
> +   * Store the value of the header using a raw version as would be represented
> +   * in MIME. So as to handle 8-bit headers properly, the charset needs to be
> +   * specified, although it may be null.

What's the default behaviour if the charset is null? what about if it's invalid?

::: mailnews/mime/src/mimeJSComponents.js
@@ +49,5 @@
> +  getHeader: function (aHeaderName) {
> +    let name = aHeaderName.toLowerCase();
> +    return this._headers.get(name);
> +  },
> +  

white space

::: mailnews/mime/test/unit/test_structured_headers.js
@@ +49,5 @@
> +  let url = Services.io.newFileURI(do_get_file("custom_header.js")).spec;
> +  Cc["@mozilla.org/categorymanager;1"]
> +    .getService(Ci.nsICategoryManager)
> +    .addCategoryEntry("custom-mime-encoder", "X-Unusual", url, false, true);
> +  // The category manager doesn't fire until a later timestep.

I'd prefer that you create and register the Observer promise before you add the category entry, just to make sure you never lose the race - and have the observer check to make sure it was notified of the correct category entry addition, if possible, to avoid random oranges if something else happens to add a category while your test is running.

Also, remove the observer after it fires.

@@ +141,5 @@
> +  do_check_eq(headers.getHeader("To").length, 1);
> +  do_check_eq(headers.getHeader("To")[0].email, "bugmail@example.org");
> +  do_check_eq(headers.getAddressingHeader("To").length, 1);
> +  do_check_eq(headers.getHeader("Content-Type").type, "text/html");
> +  

white space
Attachment #8434583 - Flags: review?(irving) → review-
(In reply to :Irving Reid (Away until July 21) from comment #5)
> do we really want these to be utf-8 strings? Everything else in this XPCOM
> API is utf-16 (RFC *822 header names should always be ASCII, I think, but
> why limit ourselves to that?).

Header names are almost always ASCII (I have seen messages that erroneously contain BOMs, which would reflect the first header having a name of \uFFFEPath or something similar). The choice is between using AString or AUTF8String, and given the diserable use cases from C++ (which usually persist in thinking in ASCII or at least near-ASCII), I'd rather have AUTF8String. Especially now that JS is trying to move away from using UTF-16 strings internally where it can. O_O

> What benefit does this confer, compared to a single header API that has a
> 'writable' attribute and throws if you use any of the write methods on a
> read-only instance?

What may not totally be clear from the patch is that there are two implementations of structured headers, one of which is inherently read-only.

> > +  void setHeader(in string aHeaderName, in jsval aValue);
> 
> What happens if I mismatch - pass a structured value to an unstructured
> header, or pass a string to an addressing header?

Mismatch is not eagerly detected. JSmime's custom structured header support is handled based on how encoding and decoding structured headers work. It's slightly suboptimal, but I can't find any tolerable way around that.

> What's the default behaviour if the charset is null? what about if it's
> invalid?

Null or invalid means there's no fallback if UTF-8 fails. In that case, the fallback boils down to replacing illegal characters with U+FFFD.
(In reply to Joshua Cranmer from comment #6)
> I'd rather have AUTF8String. Especially now that JS is trying to move away
> from using UTF-16 strings internally where it can. O_O
Except it's moving to latin1, not UTF-8.
Attachment #8434583 - Attachment is obsolete: true
Attachment #8434583 - Flags: superreview?(neil)
Attachment #8488391 - Flags: superreview?(neil)
Attachment #8488391 - Flags: review?(irving)
Comment on attachment 8488391 [details] [diff] [review]
Add the readable and writable interfaces

>+let global = this;
>+function loadEncoderFromUrl(url) {
>+  // Use a safe proxy to make sure that different scripts are isolated from each
>+  // other.
>+  let safeObject = {};
>+  let handler = {
>+    get: function (target, name) {
>+      if (name in target)
>+        return target[name];
>+      if (name in global)
>+        return global[name];
>+    }
>+  };
>+  let safeProxy = new Proxy(safeObject, handler);
Not used. sr=me with this removed.

>+  Services.scriptloader.loadSubScript(url, {}, "UTF-8");
Worth inlining?
Attachment #8488391 - Flags: superreview?(neil) → superreview+
Comment on attachment 8488391 [details] [diff] [review]
Add the readable and writable interfaces

Review of attachment 8488391 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the requested test added and questions explained away.

::: mailnews/mime/public/msgIStructuredHeaders.idl
@@ +60,5 @@
> +   * but contains an empty string).
> +   *
> +   * @param aHeaderName     The name of the header to retrieve.
> +   */
> +  AString getUnstructuredHeader(in string aHeaderName);

I don't see a test for getUnstructuredHeader returns null vs. empty string. This is subtle enough behaviour that it's worth having one.

::: mailnews/mime/src/mimeJSComponents.js
@@ +190,5 @@
> +// XPIDL output.
> +function fixArray(addresses, preserveGroups, count) {
> +  function resetPrototype(obj, prototype) {
> +    let prototyped = Object.create(prototype);
> +    for (var key in obj)

Do you want all keys here, or just own properties?

::: mailnews/mime/test/unit/test_structured_headers.js
@@ +7,5 @@
> +
> +Components.utils.import("resource:///modules/Services.jsm");
> +
> +/// Verify that a specific XPCOM error code is thrown.
> +function verifyError(block, errorCode) {

The a-team (or at least gps) are trying to get us to switch to Assert.jsm style assertions for xpcshell tests (see https://developer.mozilla.org/en-US/docs/Mozilla/QA/Writing_xpcshell-based_unit_tests#Assertions_and_logging), so use Assert.throws() instead of implementing this. I'm not going to demand replacement of all the other do_whatever checks with Assert.jsm for this review, seeing as I haven't completely switched over myself yet.

@@ +18,5 @@
> +  do_check_eq(caught, errorCode);
> +}
> +
> +var StructuredHeaders = CC("@mozilla.org/messenger/structuredheaders;1",
> +                           Ci.msgIWritableStructuredHeaders);

Would it be worth declaring 'headers' (or 'structuredHeaders') in mailServices.js?
Attachment #8488391 - Flags: review?(irving) → review+
(In reply to :Irving Reid from comment #10)
> I don't see a test for getUnstructuredHeader returns null vs. empty string.
> This is subtle enough behaviour that it's worth having one.

See line #117 in test_structured_headers:
do_check_false(headers.hasHeader("Subject"));
do_check_true(headers.getUnstructuredHeader("Subject") === null);

> Do you want all keys here, or just own properties?

Um. It shouldn't matter. But I guess own properties works better.

> Would it be worth declaring 'headers' (or 'structuredHeaders') in
> mailServices.js?

First off, it's not a server. More importantly, I don't imagine this construct being directly used by a lot of people, not enough to warrant being added to a services-like object.
https://hg.mozilla.org/comm-central/rev/b1570c0200ff
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 35.0
OS: Linux → All
Hardware: x86_64 → All
Target Milestone: Thunderbird 35.0 → Thunderbird 36.0
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: