Closed Bug 868233 Opened 12 years ago Closed 9 years ago

Allow XPCOM access to body handler

Categories

(MailNews Core :: Search, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rkent, Assigned: opera.wang)

Details

Attachments

(2 files, 1 obsolete file)

A very common request that people have in custom searches and filters is to search the actual message body. In built-in filters, this is done using a helper class nsMsgBodyHandler. To ease searches involving the body, make that class available as a js-accessible XPCOM object.
Assignee: nobody → kent
Severity: normal → enhancement
Status: NEW → ASSIGNED
Attached patch WIP rev1Splinter Review
My private conversation with Opera Wang: Opera: The patch C++ code requires you to download the source code of TB, patch it and then rebuild whole TB, it would be hard if you were not an IT. Also the patch is about half year ago which means it is a bit rotted ( can't get patched directly, & needs some merging effort ). ukrainianconsular: It means if i use the patch, i should never update thunderbird & condemn thunderbird to the latest actual stable build & never update it again, since i can't expect devs at mozilla to incorporate it for good .. . could you please add it, since i have no coding knowledge (( Opera: As I said, without the patch, it's (almost) impossible to do regex search in the body. I actually tried it once, with a lot of dirty code, and still can't make it work without the possibility of crashing TB.
Unfortunately once someone starts down the path of comments like "the man seems to have an ego & practices abuse of trust pretending" (Bug 19442 comment 71) then I assume that all expectations of any response from me or cooperation with me are gone. So at this point any communication between me and Mr. Consular seems to be impossible. But I would still be willing to pursue this with a friendly third party. I'd completely forgotten that I had done this WIP patch, which I suppose is why I posted it here in the first place, knowing I would forget. The patch should not be all that controversial. It simply adds capability for a javascript-based extension to hook into some code that would allow extensions such as FiltaQuilla to implement much-requested functionality. But I don't recall if the patch works. What this needs at this point is for someone to compile and test it with a recent TB build, and confirm that it does what is expected. There might still be time to move this forward prior to Thunderbird 31, though that window is fairly narrow. But in the meantime I will unassign myself since I am not actively working on this.
Assignee: kent → nobody
Status: ASSIGNED → NEW
"Unfortunately once someone starts down the path of comments like "the man seems to have an ego & practices abuse of trust pretending" (Bug 19442 comment 71) then I assume that all expectations of any response from me or cooperation with me are gone" Wrong Kent .. . They were gone since i've made that request 2 years ago, & you've then told me you didn't have time for that after pretending it was "easy". if it's not abuse of trust, then the term "easy" was for my part, displaced, cause it gives good hopes to the person who contacts you. Then you ignored all my attempts to re-ask that fix .. . But if you ever had a sincere will, i'll address you my apologize today.
Attached patch bug_868233.patch (obsolete) — Splinter Review
This is the patch based on Kent's patch with small modifications: 1. unbitrotted 2. use localAccountUtils for the test case so it can pass 3. Add one more interface 'AUTF8String read(in unsigned long aCount)', if extensions author want to look into the raw body (jsmime?), this interface might be useful, as 'getNextMessageLine' will strip all attachments. With this patch, Extensions like FiltaQuilla or 'Expression Search / GmailUI' can doing more complex filter like 'regular expression search for body' (bug 19442), or 'filter by attachment name' (bug 224392) I also modified my addon to verify the functional of this patch and seems it works. BTW, I work on this just because I'm the maintainer of 'Expression Search / GmailUI' and now I'm using a dirty method (thread.processNextEvent) to query body which might crash TB.
Attachment #8379618 - Flags: review?(kaie)
I wonder which mozilla devs are following this thread, maybe: kaie@kuix.de & mkmelin+mozilla@iki.fi Whoever is willing to update thunderbid's nightly or stable build with rkent's patch updated by opera wang, please let me know when this process will be done. & i guess filtaquilla would need an update too in order to work with that patch cause rkent uses personalized "internet headers properties" of search like bccList: ccList: & recipients: Like the character set is usually found in the "Content-Type:" header, & his property for that is "msgCharSet:" So i guess a "body:" or "msgBody:" property has to be added yet.
Comment on attachment 8379618 [details] [diff] [review] bug_868233.patch change reviewer to :Irving Reid as Kai is mainly focused on security. And please assign to me.
Attachment #8379618 - Flags: review?(kaie) → review?(irving)
Assignee: nobody → opera.wang
Status: NEW → ASSIGNED
OS: Windows 7 → All
Hardware: x86_64 → All
Comment on attachment 8379618 [details] [diff] [review] bug_868233.patch Review of attachment 8379618 [details] [diff] [review]: ----------------------------------------------------------------- Can you solve the same problem with nsIMsgFolder::getMsgInputStream(), possibly combined with nsIMsgFolder::getMsgTextFromStream()? If so, I'd rather support that instead of exposing another interface. http://dxr.mozilla.org/comm-central/source/mailnews/base/public/nsIMsgFolder.idl#550 http://dxr.mozilla.org/comm-central/source/mailnews/base/public/nsIMsgFolder.idl#740 Aside from that the patch is fine, with a few white space fixups - I'm going to give f+ rather than r+ until we decide what to do about the duplication of APIs. ::: mailnews/base/search/public/nsMsgBodyHandler.h @@ +14,5 @@ > //--------------------------------------------------------------------------- > // nsMsgBodyHandler: used to retrieve lines from POP and IMAP offline messages. > // This is a helper class used by nsMsgSearchTerm::MatchBody > //--------------------------------------------------------------------------- > +class nsMsgBodyHandler : public nsIMsgBodyHandler Please remove the trailing white space from this line @@ +69,5 @@ > > nsIMsgSearchScopeTerm *m_scope; > nsCOMPtr <nsILineInputStream> m_fileLineStream; > nsCOMPtr <nsIFile> m_localFile; > + nsCOMPtr<nsIInputStream> m_inputStream; Space between nsCOMPtr and <nsIInputStream> to match style of lines above ::: mailnews/base/search/src/nsMsgBodyHandler.cpp @@ +32,5 @@ > + m_headers = NULL; > + m_headersSize = 0; > + m_Filtering = false; // make sure we set this before we call initialize... > + InitLocal(scope, numLines, msg, db); > + Please remove spaces from this line @@ +57,5 @@ > + return NS_OK; > +} > + > +NS_IMETHODIMP nsMsgBodyHandler::Init(nsIMsgDBHdr* aMsgHdr, bool aStripHeaders, bool aStripHtml) > +{ White space @@ +85,4 @@ > > + OpenLocalFolder(); > + return NS_OK; > +} White space ::: mailnews/base/test/unit/test_searchCustomBody.js @@ +76,5 @@ > +{ > + localAccountUtils.loadLocalMailAccount(); > + MailServices.filters.addCustomTerm(customTerm); > + > + var copyListener = White space
Attachment #8379618 - Flags: review?(irving) → feedback+
white spaces change. Also I did a quick test for nsIMsgFolder::getMsgInputStream() + nsIMsgFolder::getMsgTextFromStream(), it can give me the text part of the body, but can't get the msg header or attachment part, and seems can't get the html part too. So I still would like to have this interface. An alternate way is to make nsIInputStream::read scriptable?
Attachment #8379618 - Attachment is obsolete: true
Attachment #8385858 - Flags: review?(irving)
Joshua, Mark, Kent - what do you think about making nsIInputStream::read() scriptable?
Flags: needinfo?(standard8)
Flags: needinfo?(kent)
Flags: needinfo?(Pidgeot18)
(In reply to :Irving Reid from comment #10) > Joshua, Mark, Kent - what do you think about making nsIInputStream::read() > scriptable? There is no need to: nsIScriptableInputStream is the proper way to go. (Although there is a mild amount of issues if you have messages with embedded NUL characters and you don't code carefully). (In reply to opera wang from comment #9) > Also I did a quick test for nsIMsgFolder::getMsgInputStream() + > nsIMsgFolder::getMsgTextFromStream(), it can give me the text part of the > body, but can't get the msg header or attachment part, and seems can't get > the html part too. So I still would like to have this interface. If you want headers, then nsIMsgBodyHandler is absolutely not what you want. You can use nsIMimeHeaders if you want a sucky interface or mimeParser.jsm if you want a non-sucky interface. If you want both the text and the HTML part, you want to truly speak mime, and the body handler is the wrong thing for you. JSMime or GLoda's MimeMsg are arguably both better features. In fact, when JSMime 0.2 lands, JSMime will be able to replicate libmime's magic charset conversion functionality which does far better than any of our crappy ad-hoc MIME parsers. The downside is you have to figure out which MIME parts are body parts yourself. I see no compelling reason to make this interface public if nsIMsgFolder::getMsgInputStream exists.
Flags: needinfo?(standard8)
Flags: needinfo?(kent)
Flags: needinfo?(Pidgeot18)
Thanks Joshua, your suggestions helps me a lot. Kent, my preliminary test shows that both NetUtil.readInputStreamToString or getMsgTextFromStream works, while streamMessage + getMsgTextFromStream crash TB bodyRegex.match = function _match(aMsgHdr, aSearchValue, aSearchOp) { let folder = aMsgHdr.folder; let got; if ( 0 ) { // will crash let messenger = Cc["@mozilla.org/messenger;1"].createInstance(Ci.nsIMessenger); let listener = Cc["@mozilla.org/network/sync-stream-listener;1"].createInstance(Ci.nsISyncStreamListener); let uri = folder.getUriForMsg(aMsgHdr); messenger.messageServiceFromURI(uri).streamMessage(uri, listener, null, null, false, ""); got = folder.getMsgTextFromStream(listener.inputStream, aMsgHdr.Charset, aMsgHdr.messageSize /*read*/, aMsgHdr.messageSize /*max output*/, false/*compressQuotes*/, true/*strip HTML*/, { }/*contentType*/); } else { let result = {}; let stream = folder.getMsgInputStream(aMsgHdr, result); let mimeType = {}; try { //got = NetUtil.readInputStreamToString(stream, aMsgHdr.messageSize); // read the whole message got = folder.getMsgTextFromStream(stream, aMsgHdr.charset || '', aMsgHdr.messageSize, aMsgHdr.messageSize, false, false, mimeType); dump("mimeType:" + mimeType.value); } catch (err) { } if ( !result.value ) stream.close(); // close if not reusable } return false; };
(In reply to opera wang from comment #12) > let uri = folder.getUriForMsg(aMsgHdr); > messenger.messageServiceFromURI(uri).streamMessage(uri, listener, > null, null, false, ""); I'm guessing the crash happens somewhere in this time frame, which isn't surprising, since the message header hasn't been added to the database at this point in time yet.
I'm try the 'quick filter' mode, not 'Message Filters', so the issue might be: The streamMessage/nsISyncStreamListener call processNextEvent or just suspend current thread and return. But the match for the current header is not finished yet, that will cause re-enter issue for nsMsgSearchOfflineMail::Search(bool *aDone) @ nsMsgLocalSearch.cpp and crash
If you can accomplish a body regex custom search successfully with getMsgTextFromStream then I have no issues with abandoning this bug, and just using that.
Getting rid of filtaquilla which becomes now obsolete to (regex) filter by headers to from subject & the missing one: body Many thanks to opera wang to introduce all those options in the addon he is maintaining: Expression Search / GMailUI 0.8.8 beta http://www.sendspace.com/file/dsfhu4
The 2 addons are for different purpose so no one is obsolete. But anyone who is interested can try the beta version and maybe Kent can add to filtaquilla later. Thanks.
Comment on attachment 8385858 [details] [diff] [review] bug_868233_v2.patch Review of attachment 8385858 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Joshua Cranmer [:jcranmer] from comment #11) > (In reply to :Irving Reid from comment #10) > > Joshua, Mark, Kent - what do you think about making nsIInputStream::read() > > scriptable? > > There is no need to: nsIScriptableInputStream is the proper way to go. ... > I see no compelling reason to make this interface public if > nsIMsgFolder::getMsgInputStream exists. I agree; if getMsgInputStream + nsIScriptableInputStream provides equivalent access to the message data, we don't need nsIMsgBodyHandler.idl. See http://dxr.mozilla.org/comm-central/source/mailnews/test/resources/mailTestUtils.js#73 for the one place we test that code path... As Joshua points out, you really need JSMime to get the character encodings etc. correct, but that would be true of either approach. I'm clearing the review request for now; we can bring this back up if getMsgInputStream doesn't work out.
Attachment #8385858 - Flags: review?(irving)
Hey Opera, i'm missing this to be more precise: contains doesn't contain is isn't & not "matches" "doesn't match" for regular expressions. I'm using expression search 0.8.8 beta. ex: if i want to create a regex rule where there's only urls in the body, i can't use "is" as well as i want to have a "contains" option for regex, so it doesn't ignore everything else. Thanks
Hey Opera, there’s a serious issue. gmail ui is not recognizing my regular expression that works everywhere else online... Expression: (?ism)([>]|\s|^|"|<<)(-?i|i'll|i'd|i'm|im|id)([<]|\s) you can see a test here http://regexr.com/v1?38mmb step: create an email with for ex: Im In the body & create for body (regex) a filter to move it to trash (?ism)([>]|\s|^|"|<<)(-?i|i'll|i'd|i'm|im|id)([<]|\s) Nothing happens..
Not needed since we have a working alternative.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: