Open Bug 1994529 Opened 3 months ago Updated 7 hours ago

Filtering messages by custom header doesn't work when fetching mail for EWS folders

Categories

(MailNews Core :: Networking: Exchange, defect)

defect

Tracking

(Not tracked)

REOPENED
149 Branch

People

(Reporter: benc, Assigned: benc)

References

(Blocks 2 open bugs)

Details

Attachments

(8 files, 5 obsolete files)

1.71 KB, text/xml
Details
23.42 KB, text/xml
Details
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Filters which match a custom header never trigger when new messages are delivered to an EWS folder.

To test:

  1. Run with an EWS account set up
  2. Create a filter which matches a custom header (e.g. "X-Shoe-Size") .
    (In the filter editor, the last item in the header drop-down is "Custom...", which lets you add arbitrary headers to filter on.)
  3. Set up an action you'll notice on the filter: move matching messages to another folder.
  4. Send a test message to the account which matches the filter you just set up.
  5. click to another folder, then back to Inbox to trigger the folder to fetch the newly-arrived message

(aside: For step 4, I've been hand-crafted minimal messages and using msmtp to send it via a real email account)

Observed:
The test message appears in the Inbox

Expected:
The test message should have been moved from the Inbox to the folder specified by the filter action.

The filter does work correctly when manually run (eg "run now" button in the filter editor). But only if the full message has already been downloaded to the local store (as the custom header we want isn't stored in the msgDB, only in the raw rfc5822 data).

When fetching new messages from the server, the existing protocols (IMAP et al) download the raw header chunk (i.e. the RFC 5322 message, without the body), parse it, then pass both the resulting nsIMsgDBHdr AND the raw header chunk to the filtering code (assuming there are no filters which require the message body - if so filtering is deferred until the full message is downloaded).

But the current EWS implementation doesn't fetch the full raw header chunk. It instead (I think) receives structured data about the message as a XML object, containing a requested set of fields (to, from, subject, date etc...).
These fields are used to populate an nsIMsgDBHdr.

So the EWS new-message filtering has the nsIMsgDBHdr available, but NOT the raw header chunk (in RFC5322 format).

This means the user can't set up filters which match headers which aren't covered by nsIMsgDBHdr.
So, for example, this means you can't filter out mailing lists to a different folder (the "List-Id:" header).

EWS does retain all the mail headers (the full message download includes these - use "view source" on a message in an EWS folder to verify this), so the required data is there on the server.

To get filtering on custom headers working, the EWS implementation would need to provide all the headers to pass in to the filtering, not just the ones needed for populating an nsIMsgDBHdr.

So. Questions:

  1. Can EWS fetch raw RFC5322 header blocks instead of structured XML fields?
  2. OR, can we request all the mail headers for a message, then synthesize that into RFC5322 to pass into the filter code
  3. Is this sensible, or all a bit bonkers?

Whichever was things go, It'd be a lot more data to download for new messages than EWS currently sends over the wire - probably an order of magnitude. Modern email transports add a whole heap of bulky cruft into the headers.
But we seem to cope ok with this under IMAP et al, so it's probably fine.

(This could also fit in with the general goal of decoupling IEwsClient from nsIMsgDBHdr)

It looks like we should be able to request at least some headers from the EWS server: https://learn.microsoft.com/en-us/exchange/client-developer/web-service-reference/internetmessageheaders

However, looking at https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/hh545614(v=exchg.140)#getting-internet-message-headers, even if we request this element, it doesn't look like there's an easy solution for getting the full header block as RFC 5322.

I agree it'd be good to get the full RFC 5322 header block so we can pass it to the filtering code (I think right now populating the nsIMsgDBHdr is still easier done using the structured data in the EWS response, considering how painful I remember integrating with our existing RFC 5322 parser to be), but it doesn't look to me like there's an easy solution that doesn't involve downloading the full message.

I'd say the next step should be investigating how helpful the InternetMessageHeaders property is (considering the doc says it might not include the full collection of headers, or needs some tweaking to do so), and how much of a pain it'd be to convert it into RFC 5322 (i.e. the solution you proposed in your second bullet point).

(In reply to Brendan Abolivier [:babolivier] from comment #2)

(I think right now populating the nsIMsgDBHdr is still easier done using the structured data in the EWS response, considering how painful I remember integrating with our existing RFC 5322 parser to be)

Just as an aside, I've since written ParseMsgHeaders(), with the intention of replacing the old stateful, side-effect-replete parsing code.
https://searchfox.org/comm-central/rev/eb7989693310912851e0a39b2b4b572b14dca5f5/mailnews/local/src/nsParseMailbox.cpp#77 .

I'd say the next step should be investigating how helpful the InternetMessageHeaders property is (considering the doc says it might not include the full collection of headers, or needs some tweaking to do so), and how much of a pain it'd be to convert it into RFC 5322 (i.e. the solution you proposed in your second bullet point).

Going from structured data to RFC5322 seems a bit sacrilegious, but hey. Sould be straightforward enough :-)

Looks like internetMessageHeaders isn't complete (e.g. missing "To:" and "From:"!), but there might be a was using PR_TRANSPORT_MESSAGE_HEADERS...
Technical Article here:
https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/hh545614(v=exchg.140)

Do we have any telemetry that can help us understand how widely this is used? We may want to revisit this at a later date and hide the option for EWS. For the benefit of the subset of users who might use this, we're considering adding substantial bulk which I'm not sure is wise...so perhaps that puts me in the 3rd camp.

Personally, I think this is a really important one. It's the difference between EWS being usable for mailinglists or not.
This is by far my number one use case for filtering. Probably 95% of the filters I've set up over the last 15 years have been on "List-Id:"...

Brendan: as discussed today, here's a proposed interface to pass headers over the XPCOM boundary.
The idea is that the IEwsClient would just send an IHeaderBlock object out to the folder, which can be used to both populate the database and turned into a raw RFC822 serialisation to pass to the filtering).
I've got a C++ implementation of IWritableHeaderBlock which could be instantiated from the rust side to pass out (am I right in thinking you can't directly implement XPCOM objects in rust? They need boilerplate in C++, right?).

(It's ended up pretty similar to the msgIStructuredHeaders interfaces, but those make some assumptions I'm not happy with, and tie themselves to the serialisation and parsing more than I'd like. But ultimately I'd probably aim to unify them).

Flags: needinfo?(brendan)
Assignee: nobody → benc
Flags: needinfo?(brendan)

XML example of adding PR_TRANSPORT_MESSAGE_HEADERS to the ItemShape of GetItem requests:
https://stackoverflow.com/questions/51178945/how-to-get-the-raw-content-of-an-outlook-email-message-with-office-js-api

So, I'm including PR_TRANSPORT_MESSAGE_HEADERS in the GetItem request, but the response from the server causes a parsing error:

[ERROR ews_xpcom::safe_xpcom] an error occurred when performing operation SyncFolderItems: Ews(Deserialize(Error { path: Path { segments: [Map { key: "Body" }, Map { key: "GetItemResponse" }, Map { key: "ResponseMessages" }, Map { key: "$value" }, Seq { index: 0 }, Map { key: "Items" }, Map { key: "$value" }, Seq { index: 0 }, Enum { variant: "Message" }, Map { key: "ExtendedProperty" }, Seq { index: 0 }] }, original: Custom("missing field `extended_field_URI`") }))

My reading of the message is that the returned message holds ExtendedProperty values, but the deserialization is choking because they don't contain ExtendedFieldURI fields (or they're not being correctly mapped to extended_field_URI, for whatever reason)?

Any guidance on how to troubleshoot/fix this is welcome :-)

A few questions:

  1. Am I right in thinking that all the types are out in https://github.com/thunderbird/ews-rs, so if the definitions need tweaking that's the place to do it?
  2. Before I go looking, is there any logging in there for dumping out the raw XML going over the wire (both ways)?
  3. Are we relying on the incoming XML to be perfectly formed and to match our deserialisation definitions (in ews-rs)? If so, isn't that awfully optimistic and brittle?

(In reply to Ben Campbell from comment #9)

So, I'm including PR_TRANSPORT_MESSAGE_HEADERS in the GetItem request, but the response from the server causes a parsing error:

[ERROR ews_xpcom::safe_xpcom] an error occurred when performing operation SyncFolderItems: Ews(Deserialize(Error { path: Path { segments: [Map { key: "Body" }, Map { key: "GetItemResponse" }, Map { key: "ResponseMessages" }, Map { key: "$value" }, Seq { index: 0 }, Map { key: "Items" }, Map { key: "$value" }, Seq { index: 0 }, Enum { variant: "Message" }, Map { key: "ExtendedProperty" }, Seq { index: 0 }] }, original: Custom("missing field `extended_field_URI`") }))

My reading of the message is that the returned message holds ExtendedProperty values, but the deserialization is choking because they don't contain ExtendedFieldURI fields (or they're not being correctly mapped to extended_field_URI, for whatever reason)?

It looks like the ExtendedProperty type is missing the #[serde(rename_all = "PascalCase")] annotation, so is trying to deserialize the field with a snake_case name instead. That annotation might not be enough though, since pascal case would render it as ExtendedFieldUri, so it might need a #[serde(rename = "ExtendedFieldURI")] annotation on the field too.

Any guidance on how to troubleshoot/fix this is welcome :-)

A few questions:

  1. Am I right in thinking that all the types are out in https://github.com/thunderbird/ews-rs, so if the definitions need tweaking that's the place to do it?

Yep.

  1. Before I go looking, is there any logging in there for dumping out the raw XML going over the wire (both ways)?

I think there's a way to log it to the terminal, but I always just open the dev tools and use the network tab. Because EWS is all over HTTP, all the requests and responses are there, headers and bodies.

  1. Are we relying on the incoming XML to be perfectly formed and to match our deserialisation definitions (in ews-rs)? If so, isn't that awfully optimistic and brittle?

Sort of. Anything we get but don't know about is ignored, anything we think might be present (but not necessarily) is marked optional, anything we expect to always be present but is missing is a deserialization error. In this case, the type expresses that we always expect a field named extended_field_URI to be present in that type, when we should instead always expect a field named ExtendedFieldURI. Maybe it's arguable we should mark everything optional so parsing never fails, but when we expect something to always be there and it isn't, that is something we should probably know about.

Attached file request1.xml
Attached file response1.xml

So, following on from Comment 3, there are two options for returning a full set of headers - you can specify InternetMessageHeaders or PR_TRANSPORT_MESSAGE_HEADERS.

So, those two xml files are captured data where I request both of them:

  1. a GetItem request, asking for both InternetMessageHeaders and PR_TRANSPORT_MESSAGE_HEADERS (see <t:ExtendedFieldURI PropertyTag="0x007D" PropertyType="String"/>).
  2. the response from the EWS server, which includes the InternetMessageHeader elements and the PR_TRANSPORT_MESSAGE_HEADERS data, returned as an ExtendedProperty element.

(I captured these via the raw network logging:

$ MOZ_LOG='ews_xpcom::*:5' THUNDERBIRD_LOG_NETWORK_PAYLOADS=1 ./mach run

)

InternetMessageHeaders is "nicer", as it's structured, where as PR_TRANSPORT_MESSAGE_HEADERS seems to just be a raw RFC5322 header block (with line folding and encoded fields, both of which require a little light parsing).

But, the InternetMessageHeaders is indeed missing the address headers ("From:" and "To:"), as suggested by https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/hh545614(v=exchg.140) .
All the other header entries do appear to be included by "InternetMessageHeaders" in this example.

I think for now, I'm inclined to go with InternetMessageHeaders, and synthesize the missing address fields (we have all the needed details in the response). But if we find there are fields we cannot synthesize, we'll have to drop InternetMessageHeaders and use the PR_TRANSPORT_MESSAGE_HEADERS approach instead (i.e. raw RFC5322 header parsing, like we have to do for the other protocols).

Attachment #9521249 - Attachment description: WIP: Bug 1994529 - Pass full message headers from EwsClient to EwsFolder. → WIP: Bug 1994529 - Get synthesised headers passing from rust to C++ via IHeaderBlock.
Attachment #9521249 - Attachment is obsolete: true

EwsMessageSyncListener already holds a little state. It'll need to hold some
more as we get ready to supply the complete set of message headers for
filtering. And if it's stateful, it's simpler all round to just have the
IEwsMessageSyncListener callbacks as methods rather than std::function
members.

It's a stateful object which manages the Message Sync process, so "Handler"
seems to describe it better than "Listener".

Attachment #9540236 - Attachment description: WIP: Bug 1994529 - Add IHeaderBlock interface. → Bug 1994529 - Add IHeaderBlock interface. r=#thunderbird-back-end-reviewers
Attachment #9540237 - Attachment description: WIP: Bug 1994529 - Add ParseHeaderBlock() for IHeaderBlock->RawHdr parsing. → Bug 1994529 - Add ParseHeaderBlock() for IHeaderBlock->RawHdr parsing. r=#thunderbird-back-end-reviewers
Attachment #9540234 - Attachment is obsolete: true
Attachment #9541170 - Attachment description: WIP: Bug 1994529 - Move Ews message sync code out of EwsFolder. → Bug 1994529 - Move Ews message sync code out of EwsFolder. r=#thunderbird-back-end-reviewers
Attachment #9540233 - Attachment description: Bug 1994529 - Factor out ApplyRawHdrToDbHdr(). r=#thunderbird-back-end-reviewers → Bug 1994529 - Factor out ApplyRawHdrToDbHdr(). r=mkmelin
Attachment #9540236 - Attachment description: Bug 1994529 - Add IHeaderBlock interface. r=#thunderbird-back-end-reviewers → Bug 1994529 - Add IHeaderBlock interface. r=mkmelin
Attachment #9540237 - Attachment description: Bug 1994529 - Add ParseHeaderBlock() for IHeaderBlock->RawHdr parsing. r=#thunderbird-back-end-reviewers → Bug 1994529 - Add ParseHeaderBlock() for IHeaderBlock->RawHdr parsing. r=mkmelin
Attachment #9540235 - Attachment is obsolete: true
Attachment #9540809 - Attachment is obsolete: true
Attachment #9541657 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Target Milestone: --- → 149 Branch

Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/809d408c0384
Factor out ApplyRawHdrToDbHdr(). r=mkmelin
https://hg.mozilla.org/comm-central/rev/67ed32615fa6
Add IHeaderBlock interface. r=mkmelin
https://hg.mozilla.org/comm-central/rev/7ed20c3a3e73
Add ParseHeaderBlock() for IHeaderBlock->RawHdr parsing. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 7 hours ago
Resolution: --- → FIXED

there are patches in progress, I assume you're still working on them, so reopening.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: