Closed
Bug 295088
Opened 20 years ago
Closed 19 years ago
Filter/Search term evaluator is extremely sub-optimal
Categories
(MailNews Core :: Filters, enhancement)
MailNews Core
Filters
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hyc, Assigned: hyc)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: fixed1.8.1, perf)
Attachments
(1 file, 4 obsolete files)
|
29.51 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050516 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050516 The term evaluator builds an an expression tree for each filter and then destroys it again on a per-message basis. So even if you have a very simple set of filters, if you're running them on a large folder there is a bunch of time wasted rebuilding and destroying the same expression structure over and over again. Instead, the expression tree should be constructed once, the first time the filter is used, and re-used for the entire duration of the in-progress operation, then destroyed in the Filter's destructor. For example, I have a filter "Age in days" "is greater than" "30" / Action is Delete, which I run in my trash folder periodically. I keep deleted messages just in case I might need them again. Typically I have around 2000 messages in each Trash folder on my two main email accounts. Expiring brings them down to around 1500, but it takes at least a minute to run this single filter. It shouldn't take this long. Reproducible: Always
| Assignee | ||
Comment 1•20 years ago
|
||
Another problem - the complete expression tree is always constructed, and values are assigned to all of its nodes, before the AND/OR evaluation occurs. It should not work in this order; it should only assign values to nodes that it actually visits, so that an AND/OR result that prunes a branch doesn't require any additional work. This is most critical if bug #223591 is fixed, because Junk classification will be one of these filter nodes and you don't want to start that up if you don't need to.
Blocks: 223591
| Assignee | ||
Comment 2•20 years ago
|
||
This patch keeps a single expressionTree hanging around for the various search/filter callers, to avoid unnecessayl malloc churn. It also fixes the term evaluation to allow shortcuts.
Comment 3•20 years ago
|
||
Since there's a patch in the works, might as well confirm this.
Assignee: nobody → hyc
Status: UNCONFIRMED → NEW
Ever confirmed: true
| Assignee | ||
Comment 4•20 years ago
|
||
Since the m_evalValue field of the nsMsgSearchBoolExpression was no longer being used, I deleted that and consolidated everything that was referenced by it. Also stuck my name in...
Attachment #184244 -
Attachment is obsolete: true
| Assignee | ||
Comment 5•20 years ago
|
||
Missed cleaning up a call to AddSearchTerm...
Attachment #184280 -
Attachment is obsolete: true
| Assignee | ||
Updated•20 years ago
|
Attachment #184281 -
Flags: review?(bienvenu)
Comment 6•20 years ago
|
||
Comment on attachment 184281 [details] [diff] [review] Cleanup one more loose end thx, Howard - this looks good. There's an open bug about search evaluation not being lazy, which this fixes, right?
Attachment #184281 -
Flags: superreview?(mscott)
Attachment #184281 -
Flags: review?(bienvenu)
Attachment #184281 -
Flags: review+
| Assignee | ||
Comment 7•20 years ago
|
||
(In reply to comment #6) > (From update of attachment 184281 [details] [diff] [review] [edit]) > thx, Howard - this looks good. There's an open bug about search evaluation not > being lazy, which this fixes, right? Right. I only searched for MailNews:Filter bugs before, so I missed any :Search bugs. I think you're talking about bug #166502 right?
Comment 8•20 years ago
|
||
yeah, that's the one, thx. When this gets checked in, could mark that one fixed as well? Dup'ing or marking it dependent on this doesn't seem quite right...
| Assignee | ||
Comment 9•20 years ago
|
||
(In reply to comment #8) > yeah, that's the one, thx. When this gets checked in, could mark that one fixed > as well? Dup'ing or marking it dependent on this doesn't seem quite right... I don't have general bug editing privs; if you want to assign that bug to me then I can track it.
| Assignee | ||
Updated•20 years ago
|
Status: NEW → ASSIGNED
Comment 10•20 years ago
|
||
(In reply to comment #9) > I don't have general bug editing privs; You do now :).
Comment 11•20 years ago
|
||
Doesn't yet Solve (second half of bug #166502) ordering clauses so most expensive (body text) operations are done last (so summary op's have chance to short-circuit it)? User merely must define/correct their clauses to be in optimal order to start with, now that there is an optimal order thankfully. Nor the "pie in the sky" common subexpression elimination when exact same clauses appear (reused) in multiple filters (mentioned in bug #154867)?
| Assignee | ||
Comment 12•20 years ago
|
||
(In reply to comment #11) > Doesn't yet Solve (second half of bug #166502) ordering clauses so most > expensive (body text) operations are done last (so summary op's have chance to > short-circuit it)? User merely must define/correct their clauses to be in > optimal order to start with, now that there is an optimal order thankfully. Right. I don't believe the code should automatically re-order things behind the users' back. We should just document that the order makes a difference. > Nor the "pie in the sky" common subexpression elimination when exact same > clauses appear (reused) in multiple filters (mentioned in bug #154867)? Right again. That would require walking through all of the filter terms to see if any of them are identical, then linking the terms in the expression trees. Finding the identical terms will be an inefficient process, but I guess it will be trivial in comparison to the filter execution time.
Comment 13•20 years ago
|
||
>> Doesn't yet Solve (second half of bug #166502) ordering clauses so most >> expensive (body text) operations are done last (so summary op's have chance to >> short-circuit it)? User merely must define/correct their clauses to be in >> optimal order to start with, now that there is an optimal order thankfully. >Right. I don't believe the code should automatically re-order things behind the >users' back. We should just document that the order makes a difference. If you're going to mark this as fixing #166502 (and it's dupe #154867), then shouldn't actually fix that bug? in what 'match all' search cases could the non-short-circuited verson produce a different result from the short-circuited version? It seems like a very unintuitive 'feature' to have to document, given how much trouble humans have with boolean logic at the best of times. I always used to try to combine subject and body searches to speed things up until I realised it was pointless, but I can't imagine a situation where I'd want 'from contains public@whitelabel.org' AND 'body contains bienvenu' to return different results from 'body contains bienvenu' AND 'from contains public@whitelabel.org' Especially in a world where a body search is an order of magnitude slower than summary. or have I got the smelly end of the stick here?
| Assignee | ||
Comment 14•20 years ago
|
||
(In reply to comment #13) > >Right. I don't believe the code should automatically re-order things behind the > >users' back. We should just document that the order makes a difference. > If you're going to mark this as fixing #166502 (and it's dupe #154867), then > shouldn't actually fix that bug? As far as I'm concerned, the *bug* is that it always fully evaluated the terms instead of doing lazy evaluation, and that bug is fixed in this patch. The fact that the code executes filters in the order that you ask it to is not a bug, IMO. That is correct behavior, and anything other than the order the user specifies is incorrect behavior. Anyway, if you disagree, I'm probably not the person you need to convince, so I'm going to shut up.
Comment 15•20 years ago
|
||
Great that this bug has solved precisely what it was opened for. I'd say at least keep one of the related bugs open for the body-last issue. So far as I know order of expression evaluation can't produce different results (except variations in elapsed time to when the current date is fetched, if the age reference date isn't already snapshot-ed once upfront). IMO, body-last just "does the right thing" transparently to the user, giving more users a more optimal runtime without teaching them how to use or how to optimize their existing filters. Still give a user-pref by which to disable it: When headers are longer than body, matches on headers take longer than body; some users might have such a special purpose need to configure for: Use exactly my ordering. Ideal to: A one-time reordering of the expression tree, that only needs to be done once, then the static tree can be evaluated for all messages. But is it easier to implement up to two passes over the tree per message: first to only evaluate non-body clauses, second for remainder? Skip the first pass if user-pref said to use exactly my ordering. Would cost per message of the 2nd tree walk and 2nd expression type tests, be considered significant enough to kill this easier implementation?
| Assignee | ||
Comment 16•20 years ago
|
||
(In reply to comment #15) > Great that this bug has solved precisely what it was opened for. I'd say at > least keep one of the related bugs open for the body-last issue. So far as I > know order of expression evaluation can't produce different results (except > variations in elapsed time to when the current date is fetched, if the age > reference date isn't already snapshot-ed once upfront). See, even without spending any time in the code you've already run into an exception. I wonder how many others there are, waiting to be discovered. > IMO, body-last just "does the right thing" transparently to the user, giving > more users a more optimal runtime without teaching them how to use or how to > optimize their existing filters. Still give a user-pref by which to disable it: > When headers are longer than body, matches on headers take longer than body; > some users might have such a special purpose need to configure for: Use exactly > my ordering. Lulling users into a false sense of security, relying on automagical processes to Do The Right Thing is a bad policy, especially when the code cannot determine what The Right Thing is in every case. You're looking at things from an extremely simplistic view of what the current filter UI allows. But the filter engine allows arbitrarily nested trees of boolean combinations, and at some point someone is going to fix the UI to allow users to create these complex filters. (Notice bug #59821) When people start using these complex filters, the simple reordering you're talking about here is going to be ineffective in the general case, and figuring out how to restructure a complex expression is not (IMO) worth the effort. The most important consideration in "fixing a bug" is to make sure that you actually fix something - that means that the result is always, consistently correct. If you introduce inconsistent behavior, then you have created a problem, not a solution. Trying to automagically restructure a users' input is guaranteed to meet with varying degrees of success. It will add an inordinate amount of complexity to the code, making it harder to understand and maintain, for an extremely small potential benefit that frequently will not materialize. Of course, if you have the code that implements what you ask for, and it works, feel free to post it here.
Comment 17•20 years ago
|
||
I see your point Howard, and perhaps it is not unreasonable to leave it as is. However, in the "real world" most people will not care about complex nested expressions (and in fact a UI which allows it will probably be more confusing). Most people just want a simple search like gmail allows which searches everything as quickly as possible. Is it possible to detect that this is a "simple" expression (ie. single level) and reorder for speed only in that case?
Assignee: hyc → nobody
Status: ASSIGNED → NEW
Comment 18•20 years ago
|
||
Whoops, sorry: accidentally clicked " Reassign bug to owner and QA contact of selected component " before Commit.
Comment 19•20 years ago
|
||
(In reply to comment #18) > Whoops, sorry: accidentally clicked " Reassign bug to owner and QA contact of > selected component " before Commit. Fixing.
Assignee: nobody → hyc
| Assignee | ||
Comment 20•20 years ago
|
||
(In reply to comment #17) > > Is it possible to detect that this is a "simple" expression (ie. single level) > and reorder for speed only in that case? That sounds reasonable. Unfortunately I've used up all the spare time I have for now, so I won't be able to pursue this for a while. > I see your point Howard, and perhaps it is not unreasonable to leave it as is. > However, in the "real world" most people will not care about complex nested > expressions (and in fact a UI which allows it will probably be more confusing). It seems that not a lot of people want it, but things like bug 274192 indicate that *some* people want it. Personally, I'd like to have a full UI for this. > Most people just want a simple search like gmail allows which searches > everything as quickly as possible. Adding AND/OR and Grouping to the UI won't prevent people from defining simple filters. They can just ignore the Grouping mechanism. (But I guess we can discuss the UI on 274192 or elsewhere.)
Status: NEW → ASSIGNED
Comment 21•19 years ago
|
||
Comment on attachment 184281 [details] [diff] [review] Cleanup one more loose end sr=dmose
Attachment #184281 -
Flags: superreview?(mscott) → superreview+
Comment 22•19 years ago
|
||
Comment on attachment 184281 [details] [diff] [review] Cleanup one more loose end Howard, the patch has bitrotted enough since May that it doesn't apply anymore. Could you please attach an updated version? $ patch -p0 < ~/moz/patches/fremde/295088.diff patching file public/nsMsgSearchBoolExpression.h patching file src/nsMsgFilter.cpp Hunk #2 succeeded at 171 (offset 15 lines). Hunk #3 succeeded at 182 (offset 15 lines). Hunk #4 succeeded at 246 (offset 15 lines). Hunk #5 succeeded at 395 (offset 15 lines). Hunk #6 succeeded at 516 (offset 16 lines). patching file src/nsMsgFilter.h Hunk #3 succeeded at 122 (offset 2 lines). patching file src/nsMsgLocalSearch.cpp patching file src/nsMsgLocalSearch.h patching file src/nsMsgSearchAdapter.cpp Hunk #2 succeeded at 787 (offset -15 lines). patching file src/nsMsgSearchSession.cpp Hunk #2 FAILED at 63. Hunk #3 succeeded at 68 (offset -4 lines). Hunk #4 succeeded at 93 (offset -6 lines). Hunk #5 succeeded at 103 (offset -6 lines). Hunk #6 succeeded at 647 (offset 3 lines). Hunk #7 succeeded at 753 (offset 11 lines). 1 out of 7 hunks FAILED -- saving rejects to file src/nsMsgSearchSession.cpp.rej patching file src/nsMsgSearchSession.h Hunk #2 FAILED at 108. 1 out of 2 hunks FAILED -- saving rejects to file src/nsMsgSearchSession.h.rej
| Assignee | ||
Comment 23•19 years ago
|
||
OK, this is a fresh diff from a current tree.
Attachment #184281 -
Attachment is obsolete: true
| Assignee | ||
Comment 24•19 years ago
|
||
| Assignee | ||
Updated•19 years ago
|
Attachment #200249 -
Attachment is obsolete: true
Updated•19 years ago
|
Attachment #200258 -
Attachment description: refreshed, with missing header diff → refreshed, with missing header diff [checked in]
Comment 25•19 years ago
|
||
So nsMsgSearchSession.cpp lacked an #include nsMsgSearchBoolExpression.h and some trees got burned down, but no other harm was done. ;-)
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 26•19 years ago
|
||
This bug here might have caused Bug 313357, maybe the patch author and/or reviewers can take a look at it.
Comment 27•19 years ago
|
||
*** Bug 330816 has been marked as a duplicate of this bug. ***
Updated•19 years ago
|
Keywords: fixed1.8.1
Comment 28•18 years ago
|
||
*** Bug 357083 has been marked as a duplicate of this bug. ***
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•