Closed Bug 554294 Opened 14 years ago Closed 14 years ago

Gloda should expose a way to determine if an attachment is a "real" one

Categories

(MailNews Core :: Database, enhancement)

enhancement
Not set
normal

Tracking

(blocking-thunderbird3.1 rc1+, thunderbird3.1 rc1-fixed)

RESOLVED FIXED
Thunderbird 3.1rc1
Tracking Status
blocking-thunderbird3.1 --- rc1+
thunderbird3.1 --- rc1-fixed

People

(Reporter: protz, Assigned: protz)

Details

Attachments

(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.3a3) Gecko/20100322 Minefield/3.7a3
Build Identifier: 

I have a function

function MimeMessageHasAttachment(aMsg)
   aMsg.allAttachments.some(function (x) x.isRealAttachment);

This function returns true even when the attachment is a forwarded .eml file. I don't know if there's a clean way to detect this case, but that'd be great!

http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/modules/mimemsg.js#540 seems to be the relevant part.

Reproducible: Always
Ready the idl file. Is real attachment is there to detect if the attachment as been detached, linked etc ... It doesn't detect that it's an eml. That would be a feature request. Detecting attached messages.
Changing the subject and marking as enhancement. To further clarify things: by "real" attachment, I mean every attachment that's not fake, and by "fake", I mean:
- .eml forwarded messages or quoted messages,
- signatures (.asc),
- vcards,
- footers,
- etc.

(i.e. anything that is already displayed inline when reading the message).

I'm not sure this is the best way to classify that. I'm not even sure that it's possible to have sound semantics for classifying that way.
Severity: normal → enhancement
Summary: MimeMessageAttachment.isRealAttachment returns true for forwarded emails as .eml attachments → Gloda should expose a way to determine if an attachment is a "real" one
Thank you for filing the bug.

It might be appropriate to grow an additional helper attribute at some point when we are able to phrase a distinctly different use case.

Probably the easiest way to fix your current use case is to just check that 1) if the attachment has a filename 2) that the extension ends in .eml.

The unit test that would want to check this out is:
http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/test/unit/test_mime_emitter.js
Status: UNCONFIRMED → NEW
Ever confirmed: true
Ok, more information on this bug. The initial problem was: inline MIME quoted parts were understood as real attachments for some messages. Background information: isRealAttachment is supposed to filter out those cases using a simple test: does the part name start with "Part"? If true, then it's an inline part, not a real attachment. If it doesn't, the part has a real filename, it's an attachment.

This works OK most of the time. However, Outlook 2003 uses a LOCALIZED NAME for parts. Instead of having "Part1.2" just like anyone else, they write "Partie 1.1" ("partie" being French for "part") which is why the isRealAttachment call failed to return false for these... Maybe we should try a regexp that matches for something like "\s\S?([0-9]+\.)*([0-9]+)" ? That's a little bit risky.

Any ideas? How does the regular Thunderbird code do that?
My mistake. It's actually *Thunderbird* that localizes the part names. I just fired up a French thunderbird and now I have attachments that have isRealAttachment set to true because they are called "Partie ...". I'll try to come up with a patch to the Gloda logic as soon as I have found where the string is defined.
Ok, here's a patch that seems to do the trick. Some remarks:
- there's ugly constants around, so I copied the definition of the relevant one from http://mxr.mozilla.org/comm-central/source/mailnews/mime/src/nsMimeStringResources.h#72
- nsIStringBundle has a method formatStringFromID but the IDL (see http://mxr.mozilla.org/comm-central/source/mozilla/intl/strres/public/nsIStringBundle.idl#85) warns explicitely against using anything else but %S formatters, and indeed it doesn't work if I call that given that the .properties files use %s formatters. So I resorted to a silly String.replace.
- The .properties file are not accessible by the time mimemsg.js is loaded which is why I test first for the string and then fetch it the first time it is needed.
Attachment #441282 - Flags: review?(bugmail)
Thank you for the patch and investigation work!

Hm, so if localization is breaking us, this is very bad since it means that gloda will think all kinds of things have attachments that really don't when using a sufficiently localized version of Thunderbird.  I am bumping this to an rc1 blocker since the fix is pretty easy and this will seriously impact gloda search results.

While I'm not really above hacks when it comes to libmime, we could probably also just change libmime to not use localized names for the parts when it knows it is talking to the jsemitter.

It looks like here is where the code gets it in its head to do something localizable:
http://mxr.mozilla.org/comm-central/source/mailnews/mime/src/mimemoz2.cpp#430

This is where we determine that we are dealing with the JS emitter and set some special flags:
http://mxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.cpp#1561

We also have the interesting ability here to just not generate a fake name for the attachment at all, which might be the best thing to do?

What are your thoughts on this alternate approach, Jonathan?
blocking-thunderbird3.1: --- → rc1+
The reason why the code does something localizable is that you often see in the messagepane, in the attachments list, things like "Part 1.1". BTW, this is very unfortunate, those should not be exposed to the end user, they are not "real" attachments!. (Just to make things clear: see http://jonathan.protzenko.free.fr/attpart.png to see what I mean.)

The advantage of tweaking mimemoz2.cpp is that it's like a "no-surprises" policy, in the sense that we will be safe when someone decides to fix the issue mentioned above (and possibly changes the naming scheme, or whatever).

If we tweak it so that it sends some special pre-defined name like NOT_A_REAL_ATTACHMENT, the drawback is that if someone sends an attachment named "NOT_A_REAL_ATTACHMENT" it will be discarded by Gloda. That's minor, and we could come up with a sufficiently convoluted name, but someone might think of a wicked use of this feature.

So I'd rather be in favor of adding the part name in the attachment name, but simply not localized. So basically, discarding my patch and changing

433     tmp->real_name = MimeGetStringByID(MIME_MSG_DEFAULT_ATTACHMENT_NAME);

Into something like

if (options->dont_localize_part_names) then
  tmp->real_name = "Part %s";
else
  tmp->real_name = MimeGetStringByID(...);
While I'd like us to land this bug for 3.1, I'm having a hard time convincing myself that we'd hold for this if it were the last bug standing.  asuth, your thoughts?
Giving to protz, since he's already iterating on a patch...
Assignee: nobody → jonathan.protzenko
I think this is important: it turns out that for every non en-US Thunderbird, all messages that have "Part 1.2"-like attachments (and there's quite a lot!) are thought by Gloda to have real attachments. This gives very wrong search results when checking "must have attachment"...

We can either:
i) checkin the patch I submitted. While it's a bit ad-hoc, I believe it's correct.
ii) hack into libmime as suggested by comment 8 but as I don't have much experience with hacking C++ parts, that would definitely not be ready for rc1.

Andrew, what's your take on this?
> We also have the interesting ability here to just not generate a fake name for
> the attachment at all, which might be the best thing to do?

This is what I was thinking - we don't want to see Part 1.2 in the attachment pane anyway. I'm not sure if there is a downside to this approach.
That does sounds like a better fix, indeed, sorry I didn't catch that suggestion when triaging here.  Like bienvenu, I'm not aware of downsides, but that's not to say that there aren't any.  

I suspect the easiest way forward here would be just to generate a patch and tryserver build using that approach, and then we can play with it as well as ask clarkbw for feedback once he's back in the office.
Comment on attachment 441282 [details] [diff] [review]
[checked in with r=asuth] Proposed patch http://hg.mozilla.org/releases/comm-1.9.2/rev/d569cd87c246 http://hg.mozilla.org/comm-central/rev/0315af2cc3b4

Right, so it sounds like the two goals are:

1) Have gloda not need to reverse engineer that the attachment doesn't really have a name.  I was thinking passing null or an empty string in such a case.

2) Have the message reader not show the part, maybe ignoring attachments so characterized.

3) Have neither consumer die horribly trying to look inside a null object.  (This is presumably not tricky, just needs to be done.)

As such I'm obsoleting this patch.

Jonathan, will you be able/do you feel comfortable generating such a patch for the rc1 timeframe or would you like me to take over the bug?
Attachment #441282 - Attachment is obsolete: true
Attachment #441282 - Flags: review?(bugmail)
One additional concern:

4) Make sure the message list doesn't display the "attachment" icon when there are only Part 1.2-like attachments.

I dont know where to start for that last point. Do you think that fixing the regular consumer will also fix this as a side effect?
(In reply to comment #15)
> 4) Make sure the message list doesn't display the "attachment" icon when there
> are only Part 1.2-like attachments.
> 
> I dont know where to start for that last point. Do you think that fixing the
> regular consumer will also fix this as a side effect?

per bienvenu, the message reader will remove the bit that says "attachments!" if it streams a message for display and doesn't hear about any attachments but the bit claims they are there (because the IMAP headers led it to believe there were attachments.

I forget what bug he said that on recently, but he did.
(Adding Patrick from Enigmail)

Patrick, is enigmail going to have any trouble if we go ahead with this?  (Where 'this' is to generally hide un-named attachments from the message reader.)
Comment on attachment 442657 [details] [diff] [review]
New iteration of the patch following the latest suggestions

As discussed on IRC, this needs a test in gloda's test_mime_emitter.js.  (I have not run the gloda tests with this patch; I am assuming they currently pass with these changes...)

I fear a test case will also need to be added to mail/test/mozmill/attachment/test-attachment.js too.

Otherwise, I'm very happy with this patch!

bienvenu will need to review this patch too once the tests situation is taken care of.

Let me know if you have any trouble with the tests.

on file: mailnews/db/gloda/modules/fundattr.js line 561
>         aMsgHdr.markHasAttachments(false);

I checked and this does bail before generating any spurious notifications, so
it's fine to call this without checking whether the flag was previously set. 
(Thought I'd mention this since it raised a flag for me.)


on file: mailnews/mime/emitters/src/nsMimeBaseEmitter.cpp line 400
>   if ( name.Length() > 0 )

nit: no padded spaces inside the parens; I think the space in the case below
is likely just to separate the two parens from each other and is against
general mailnews style.
Attachment #442657 - Flags: review?(bugmail) → feedback+
Enigmail should be fine, as long as the complete mime tree is available through MsgHdrToMimeMessage()
Whiteboard: [patch ready tomorrow or friday]
Attachment #442657 - Attachment is obsolete: true
Here's a new patch with an added mozmill test. I just added in the testcase one of those MIME parts that shouldn't show up as attachments anymore. Test passes so far. I can still correct a few details if needed before the patch is checked-in.
Attachment #443574 - Flags: review?(bugmail)
Whiteboard: [patch ready tomorrow or friday] → [patch ready]
(In reply to comment #2)
> Changing the subject and marking as enhancement. To further clarify things: by
> "real" attachment, I mean every attachment that's not fake, and by "fake", I
> mean:

I would definitely agree that we should have a list somewhere that determines the following items as "fake" attachments.

 - .eml forwarded messages or quoted messages,
 - signatures (.asc),
 - vcards,
 - footers,
 - Part $X attachments

I'm not sure this is a definitive list at this time, it would be good to be able to update the list in the future to add more items.  These "fake" items should not appear in the message list view, the message reader, nor in gloda search results as attachments.

The semantics for this list are always going to be somewhat fuzzy as we're trying to capture what people actually feel are attachments and not what the protocol tells us.
Signatures (.asc) files are real attachments that should be allowed to open or save. You should check the content-type: "application/pgp-signature" is a "fake" attachment.
(In reply to comment #23)
> Signatures (.asc) files are real attachments that should be allowed to open or
> save. You should check the content-type: "application/pgp-signature" is a
> "fake" attachment.

Yes, it seems like we still want the signature to be accessible via the UI.  This suggests it is appropriate to split the convenience attributes at this time to cover the two situations:

1) Is this part an explicit attachment like a document the user attached?

2) Is this part something the user would have a reason to want to manipulate?  .eml messages, signatures, and vcards all seem good candidates for manipulation.  Mailing list gunk footers and similar un-named appended parts seem equally valid to ignore.

We would probably want to use #1 inside #2 in the UI in order to figure out whether to treat the part as meta-data / a transmission artifact or not.

In terms of this patch, if I understand things correctly per RFC 2015, the PGP signatures may be un-named and thus get Part 1.# names and so this patch will break that?  ( http://www.ietf.org/rfc/rfc2015.txt )
(In reply to comment #24)
> Yes, it seems like we still want the signature to be accessible via the UI. 
> This suggests it is appropriate to split the convenience attributes at this
> time to cover the two situations:

Er, maybe not "at this time", since we're right up against an rc1 and the message reader doesn't use the JS mime emitter abstraction anyways.
If the signature is unnamed, this will not show up in the message reader UI either. My patch touches nsMimeBaseEmitter.cpp (or whatever the name is) that is used by the message reader to get a list of attachments to display, and I just skip unnamed attachments.
(In reply to comment #26)
> If the signature is unnamed, this will not show up in the message reader UI
> either. My patch touches nsMimeBaseEmitter.cpp (or whatever the name is) that
> is used by the message reader to get a list of attachments to display, and I
> just skip unnamed attachments.

Indeed.  I fear we are going to have to lose that part of the patch and make this patch only affect the gloda/jsmimeemitter use case.  I don't think there's any way we can get enough exposure for the change given the proximity of the release candidate.
http://mxr.mozilla.org/comm-central/source/mailnews/mime/src/mimemoz2.cpp#1801 will need to be changed to use a ternary operator to use EmptyCString() if 'name' is null.  As NeilAway pointed out and I just looked into, variations of strlen can get invoked and the man page doesn't say strlen(NULL) is defined to be cool.
(In reply to comment #24)
> In terms of this patch, if I understand things correctly per RFC 2015, the PGP
> signatures may be un-named and thus get Part 1.# names and so this patch will
> break that?  ( http://www.ietf.org/rfc/rfc2015.txt )

RFC 2015 has been superseded by RFC 3156, but yes you're right. A signature can have an optional name, but doesn't have to. Many but not all mail clients I know do a name.
This is getting a little bit beyond the scope of the original patch. Andrew, what about:
- committing my patch that reverse-engineers the localized name for "Part %d" on comm-1.9.2: no risk, doesn't destroy the message reader, fixes that important issue of messages with no attachments showing as "with attachments" in search results,
- I provide a better patch that takes comment #28 and comment #23 into account for comm-central, and we wait to see how it evolves there without compromising rc1
?

That way, we resolve the issue in a safe but hackish way for 3.1 and we try do to something better but potentially more risky for 3.2, and that leaves us the time to see how it goes.
You make a compelling argument; I'll do that.  Sorry for the contradictory feedback about the patch!
pushed to comm-1.9.2:
http://hg.mozilla.org/releases/comm-1.9.2/rev/d569cd87c246

pushed to comm-central:
http://hg.mozilla.org/comm-central/rev/0315af2cc3b4
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.1rc1
Attachment #441282 - Attachment description: Proposed patch → Proposed patch http://hg.mozilla.org/releases/comm-1.9.2/rev/d569cd87c246 http://hg.mozilla.org/comm-central/rev/0315af2cc3b4
Attachment #441282 - Attachment is obsolete: false
Thanks for the patch!  It has been pushed with a minor naming change and the addition of a small comment.

The outstanding issues are going to need a new bug(s)
Whiteboard: [patch ready]
Attachment #441282 - Attachment description: Proposed patch http://hg.mozilla.org/releases/comm-1.9.2/rev/d569cd87c246 http://hg.mozilla.org/comm-central/rev/0315af2cc3b4 → [checked in with r=asuth] Proposed patch http://hg.mozilla.org/releases/comm-1.9.2/rev/d569cd87c246 http://hg.mozilla.org/comm-central/rev/0315af2cc3b4
Comment on attachment 443574 [details] [diff] [review]
Same patch as before, but with mozmill tests

Removing obsolete review request.
Attachment #443574 - Attachment is obsolete: true
Attachment #443574 - Flags: review?(bugmail)
Followup on bug 564423
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: