Filtering messages by custom header doesn't work when fetching mail for EWS folders
Categories
(MailNews Core :: Networking: Exchange, defect)
Tracking
(Not tracked)
People
(Reporter: benc, Assigned: benc)
References
(Blocks 2 open bugs)
Details
Attachments
(8 files, 5 obsolete files)
|
1.71 KB,
text/xml
|
Details | |
|
23.42 KB,
text/xml
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
Filters which match a custom header never trigger when new messages are delivered to an EWS folder.
To test:
- Run with an EWS account set up
- Create a filter which matches a custom header (e.g. "X-Shoe-Size") .
(In the filter editor, the last item in the header drop-down is "Custom...", which lets you add arbitrary headers to filter on.) - Set up an action you'll notice on the filter: move matching messages to another folder.
- Send a test message to the account which matches the filter you just set up.
- click to another folder, then back to Inbox to trigger the folder to fetch the newly-arrived message
(aside: For step 4, I've been hand-crafted minimal messages and using msmtp to send it via a real email account)
Observed:
The test message appears in the Inbox
Expected:
The test message should have been moved from the Inbox to the folder specified by the filter action.
The filter does work correctly when manually run (eg "run now" button in the filter editor). But only if the full message has already been downloaded to the local store (as the custom header we want isn't stored in the msgDB, only in the raw rfc5822 data).
| Assignee | ||
Comment 1•3 months ago
|
||
When fetching new messages from the server, the existing protocols (IMAP et al) download the raw header chunk (i.e. the RFC 5322 message, without the body), parse it, then pass both the resulting nsIMsgDBHdr AND the raw header chunk to the filtering code (assuming there are no filters which require the message body - if so filtering is deferred until the full message is downloaded).
But the current EWS implementation doesn't fetch the full raw header chunk. It instead (I think) receives structured data about the message as a XML object, containing a requested set of fields (to, from, subject, date etc...).
These fields are used to populate an nsIMsgDBHdr.
So the EWS new-message filtering has the nsIMsgDBHdr available, but NOT the raw header chunk (in RFC5322 format).
This means the user can't set up filters which match headers which aren't covered by nsIMsgDBHdr.
So, for example, this means you can't filter out mailing lists to a different folder (the "List-Id:" header).
EWS does retain all the mail headers (the full message download includes these - use "view source" on a message in an EWS folder to verify this), so the required data is there on the server.
To get filtering on custom headers working, the EWS implementation would need to provide all the headers to pass in to the filtering, not just the ones needed for populating an nsIMsgDBHdr.
So. Questions:
- Can EWS fetch raw RFC5322 header blocks instead of structured XML fields?
- OR, can we request all the mail headers for a message, then synthesize that into RFC5322 to pass into the filter code
- Is this sensible, or all a bit bonkers?
Whichever was things go, It'd be a lot more data to download for new messages than EWS currently sends over the wire - probably an order of magnitude. Modern email transports add a whole heap of bulky cruft into the headers.
But we seem to cope ok with this under IMAP et al, so it's probably fine.
(This could also fit in with the general goal of decoupling IEwsClient from nsIMsgDBHdr)
Comment 2•3 months ago
•
|
||
It looks like we should be able to request at least some headers from the EWS server: https://learn.microsoft.com/en-us/exchange/client-developer/web-service-reference/internetmessageheaders
However, looking at https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/hh545614(v=exchg.140)#getting-internet-message-headers, even if we request this element, it doesn't look like there's an easy solution for getting the full header block as RFC 5322.
I agree it'd be good to get the full RFC 5322 header block so we can pass it to the filtering code (I think right now populating the nsIMsgDBHdr is still easier done using the structured data in the EWS response, considering how painful I remember integrating with our existing RFC 5322 parser to be), but it doesn't look to me like there's an easy solution that doesn't involve downloading the full message.
I'd say the next step should be investigating how helpful the InternetMessageHeaders property is (considering the doc says it might not include the full collection of headers, or needs some tweaking to do so), and how much of a pain it'd be to convert it into RFC 5322 (i.e. the solution you proposed in your second bullet point).
| Assignee | ||
Comment 3•3 months ago
|
||
(In reply to Brendan Abolivier [:babolivier] from comment #2)
(I think right now populating the
nsIMsgDBHdris still easier done using the structured data in the EWS response, considering how painful I remember integrating with our existing RFC 5322 parser to be)
Just as an aside, I've since written ParseMsgHeaders(), with the intention of replacing the old stateful, side-effect-replete parsing code.
https://searchfox.org/comm-central/rev/eb7989693310912851e0a39b2b4b572b14dca5f5/mailnews/local/src/nsParseMailbox.cpp#77 .
I'd say the next step should be investigating how helpful the
InternetMessageHeadersproperty is (considering the doc says it might not include the full collection of headers, or needs some tweaking to do so), and how much of a pain it'd be to convert it into RFC 5322 (i.e. the solution you proposed in your second bullet point).
Going from structured data to RFC5322 seems a bit sacrilegious, but hey. Sould be straightforward enough :-)
Looks like internetMessageHeaders isn't complete (e.g. missing "To:" and "From:"!), but there might be a was using PR_TRANSPORT_MESSAGE_HEADERS...
Technical Article here:
https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/hh545614(v=exchg.140)
Comment 4•3 months ago
|
||
Do we have any telemetry that can help us understand how widely this is used? We may want to revisit this at a later date and hide the option for EWS. For the benefit of the subset of users who might use this, we're considering adding substantial bulk which I'm not sure is wise...so perhaps that puts me in the 3rd camp.
| Assignee | ||
Comment 5•3 months ago
|
||
Personally, I think this is a really important one. It's the difference between EWS being usable for mailinglists or not.
This is by far my number one use case for filtering. Probably 95% of the filters I've set up over the last 15 years have been on "List-Id:"...
| Assignee | ||
Comment 6•3 months ago
|
||
| Assignee | ||
Comment 7•3 months ago
•
|
||
Brendan: as discussed today, here's a proposed interface to pass headers over the XPCOM boundary.
The idea is that the IEwsClient would just send an IHeaderBlock object out to the folder, which can be used to both populate the database and turned into a raw RFC822 serialisation to pass to the filtering).
I've got a C++ implementation of IWritableHeaderBlock which could be instantiated from the rust side to pass out (am I right in thinking you can't directly implement XPCOM objects in rust? They need boilerplate in C++, right?).
(It's ended up pretty similar to the msgIStructuredHeaders interfaces, but those make some assumptions I'm not happy with, and tie themselves to the serialisation and parsing more than I'd like. But ultimately I'd probably aim to unify them).
| Assignee | ||
Updated•3 months ago
|
| Assignee | ||
Updated•3 months ago
|
| Assignee | ||
Updated•3 months ago
|
| Assignee | ||
Comment 8•2 months ago
•
|
||
XML example of adding PR_TRANSPORT_MESSAGE_HEADERS to the ItemShape of GetItem requests:
https://stackoverflow.com/questions/51178945/how-to-get-the-raw-content-of-an-outlook-email-message-with-office-js-api
| Assignee | ||
Comment 9•2 months ago
|
||
So, I'm including PR_TRANSPORT_MESSAGE_HEADERS in the GetItem request, but the response from the server causes a parsing error:
[ERROR ews_xpcom::safe_xpcom] an error occurred when performing operation SyncFolderItems: Ews(Deserialize(Error { path: Path { segments: [Map { key: "Body" }, Map { key: "GetItemResponse" }, Map { key: "ResponseMessages" }, Map { key: "$value" }, Seq { index: 0 }, Map { key: "Items" }, Map { key: "$value" }, Seq { index: 0 }, Enum { variant: "Message" }, Map { key: "ExtendedProperty" }, Seq { index: 0 }] }, original: Custom("missing field `extended_field_URI`") }))
My reading of the message is that the returned message holds ExtendedProperty values, but the deserialization is choking because they don't contain ExtendedFieldURI fields (or they're not being correctly mapped to extended_field_URI, for whatever reason)?
Any guidance on how to troubleshoot/fix this is welcome :-)
A few questions:
- Am I right in thinking that all the types are out in https://github.com/thunderbird/ews-rs, so if the definitions need tweaking that's the place to do it?
- Before I go looking, is there any logging in there for dumping out the raw XML going over the wire (both ways)?
- Are we relying on the incoming XML to be perfectly formed and to match our deserialisation definitions (in ews-rs)? If so, isn't that awfully optimistic and brittle?
Comment 10•2 months ago
•
|
||
(In reply to Ben Campbell from comment #9)
So, I'm including
PR_TRANSPORT_MESSAGE_HEADERSin theGetItemrequest, but the response from the server causes a parsing error:[ERROR ews_xpcom::safe_xpcom] an error occurred when performing operation SyncFolderItems: Ews(Deserialize(Error { path: Path { segments: [Map { key: "Body" }, Map { key: "GetItemResponse" }, Map { key: "ResponseMessages" }, Map { key: "$value" }, Seq { index: 0 }, Map { key: "Items" }, Map { key: "$value" }, Seq { index: 0 }, Enum { variant: "Message" }, Map { key: "ExtendedProperty" }, Seq { index: 0 }] }, original: Custom("missing field `extended_field_URI`") }))My reading of the message is that the returned message holds
ExtendedPropertyvalues, but the deserialization is choking because they don't containExtendedFieldURIfields (or they're not being correctly mapped toextended_field_URI, for whatever reason)?
It looks like the ExtendedProperty type is missing the #[serde(rename_all = "PascalCase")] annotation, so is trying to deserialize the field with a snake_case name instead. That annotation might not be enough though, since pascal case would render it as ExtendedFieldUri, so it might need a #[serde(rename = "ExtendedFieldURI")] annotation on the field too.
Any guidance on how to troubleshoot/fix this is welcome :-)
A few questions:
- Am I right in thinking that all the types are out in https://github.com/thunderbird/ews-rs, so if the definitions need tweaking that's the place to do it?
Yep.
- Before I go looking, is there any logging in there for dumping out the raw XML going over the wire (both ways)?
I think there's a way to log it to the terminal, but I always just open the dev tools and use the network tab. Because EWS is all over HTTP, all the requests and responses are there, headers and bodies.
- Are we relying on the incoming XML to be perfectly formed and to match our deserialisation definitions (in ews-rs)? If so, isn't that awfully optimistic and brittle?
Sort of. Anything we get but don't know about is ignored, anything we think might be present (but not necessarily) is marked optional, anything we expect to always be present but is missing is a deserialization error. In this case, the type expresses that we always expect a field named extended_field_URI to be present in that type, when we should instead always expect a field named ExtendedFieldURI. Maybe it's arguable we should mark everything optional so parsing never fails, but when we expect something to always be there and it isn't, that is something we should probably know about.
| Assignee | ||
Comment 11•2 months ago
|
||
| Assignee | ||
Comment 12•2 months ago
|
||
| Assignee | ||
Comment 13•2 months ago
|
||
So, following on from Comment 3, there are two options for returning a full set of headers - you can specify InternetMessageHeaders or PR_TRANSPORT_MESSAGE_HEADERS.
So, those two xml files are captured data where I request both of them:
- a
GetItemrequest, asking for bothInternetMessageHeadersandPR_TRANSPORT_MESSAGE_HEADERS(see<t:ExtendedFieldURI PropertyTag="0x007D" PropertyType="String"/>). - the response from the EWS server, which includes the
InternetMessageHeaderelements and thePR_TRANSPORT_MESSAGE_HEADERSdata, returned as anExtendedPropertyelement.
(I captured these via the raw network logging:
$ MOZ_LOG='ews_xpcom::*:5' THUNDERBIRD_LOG_NETWORK_PAYLOADS=1 ./mach run
)
InternetMessageHeaders is "nicer", as it's structured, where as PR_TRANSPORT_MESSAGE_HEADERS seems to just be a raw RFC5322 header block (with line folding and encoded fields, both of which require a little light parsing).
But, the InternetMessageHeaders is indeed missing the address headers ("From:" and "To:"), as suggested by https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/hh545614(v=exchg.140) .
All the other header entries do appear to be included by "InternetMessageHeaders" in this example.
I think for now, I'm inclined to go with InternetMessageHeaders, and synthesize the missing address fields (we have all the needed details in the response). But if we find there are fields we cannot synthesize, we'll have to drop InternetMessageHeaders and use the PR_TRANSPORT_MESSAGE_HEADERS approach instead (i.e. raw RFC5322 header parsing, like we have to do for the other protocols).
Updated•23 days ago
|
Updated•8 days ago
|
| Assignee | ||
Comment 14•8 days ago
|
||
| Assignee | ||
Comment 15•8 days ago
|
||
| Assignee | ||
Comment 16•8 days ago
|
||
EwsMessageSyncListener already holds a little state. It'll need to hold some
more as we get ready to supply the complete set of message headers for
filtering. And if it's stateful, it's simpler all round to just have the
IEwsMessageSyncListener callbacks as methods rather than std::function
members.
| Assignee | ||
Comment 17•8 days ago
|
||
| Assignee | ||
Comment 18•8 days ago
|
||
| Assignee | ||
Comment 19•8 days ago
|
||
| Assignee | ||
Comment 20•8 days ago
|
||
| Assignee | ||
Comment 21•6 days ago
|
||
It's a stateful object which manages the Message Sync process, so "Handler"
seems to describe it better than "Listener".
Updated•6 days ago
|
Updated•6 days ago
|
| Assignee | ||
Comment 22•5 days ago
|
||
Updated•5 days ago
|
Updated•2 days ago
|
| Assignee | ||
Comment 23•2 days ago
|
||
Updated•1 day ago
|
Updated•1 day ago
|
Updated•1 day ago
|
Updated•1 day ago
|
Updated•1 day ago
|
| Assignee | ||
Updated•18 hours ago
|
Updated•18 hours ago
|
Updated•11 hours ago
|
Comment 24•7 hours ago
|
||
Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/809d408c0384
Factor out ApplyRawHdrToDbHdr(). r=mkmelin
https://hg.mozilla.org/comm-central/rev/67ed32615fa6
Add IHeaderBlock interface. r=mkmelin
https://hg.mozilla.org/comm-central/rev/7ed20c3a3e73
Add ParseHeaderBlock() for IHeaderBlock->RawHdr parsing. r=mkmelin
Comment 25•7 hours ago
|
||
there are patches in progress, I assume you're still working on them, so reopening.
Description
•