Closed Bug 697522 Opened 8 years ago Closed 7 months ago

precise logging of message filter runs and actions

Categories

(MailNews Core :: Filters, enhancement)

enhancement
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 68.0

People

(Reporter: aceman, Assigned: aceman)

References

(Blocks 5 open bugs)

Details

Attachments

(1 file, 3 obsolete files)

I have one 1 top issue that would help us triage one class of bugs which
are currently hard. There are reports that are uncertain whether filters were
even run, on what folders etc. It is not sure whether the filters weren't run
or they just didn't match anything. We would really need some way to be sure about
this. The existing Filter log is almost unusable for this, as it only logs when
any filter matched some message.

So my proposal is to create some better Filter log (either in the existing Filter
log window, or as a new value into NSPR_LOG_MODULES) with at least these properties:
+ There is 1 block of information for each run of filters (with timestamp).
++ There is then 1 block of data for each account where filters are run
   (probably only those that have new msgs or where manual run). The block contains this data:
++ How were filters invoked (automatic run, manual, automatic after classification).
++ Which account is subject to the filter run (important for merged/global inbox).
++ The type of the account (like POP, IMAP, News, Local, RSS).
++ How many enabled/disabled filters are being run (which are applicable
   depending on the invocation type above).
+++ Then, for each folder (or newsgroup) in this account there would be this block of data:
+++ In case of newsgroup, show the filter count again, as they can be defined per-newsgroup.
+++ How many messages are subject to these filters (count of new messages for automatic,
    count of all messages if manual).
+++ List of filter hits in similar format to the existing one (in Filter log) - which
    filter matched, on which message (subject) and what action was taken.
+++ End of block '+++'
++ End of block '++'
+ End of block '+'

This would help us find out if there are cases where filters really do not run or why
people have the feeling they do not run.

Of course, if the internal loops are structured in some other logic (the proposal is based on "for each account, run all filters there, next"), feel free to restructure the log as best fits the internal logic.
Blocks: 567986
Blocks: 537224
Blocks: 622307
Blocks: 552936
I am marking this bug as blocking the bugs that I would be able to triage/debug better using this logging.
Message filter log of Tb is HTML data and is currently saved in filterlog.html. So, for "simple reading log", "simple searchig string" and so on, "View filterlog.html by your favarite Browser" is sufficient.

Problems in current message filter logging;
(a) message filter log is buffered, so last part can't be viewed by "copy filterlog.html file +
    Browser".
    (a-1) "Force buffered filter data write" function is needed.
(b) (b-1) no timestamp is written in filter log.
    (b-2) filter log is written before actual execution of "Copy to copy target folder" and/or
          "Move to move target folder".
    So, message filter log is useless for problem analysis of problem like "filter move failure".
    Because of "no timestamp", Tb users can't know "when filter was applied to a mail" by log. 
    Improvement of (b-1) is not so hard, but not so easy, because of C++ code && localization.
>     http://mxr.mozilla.org/comm-central/search?string=Applied+filter&find=mail&findi=&filter=^[^\0]*%24&hitlimit=&tree=comm-central
>     http://mxr.mozilla.org/comm-central/ident?i=filterLogDetectStr
>     http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgFilter.cpp#536
    Improvement of (b-2) is pretty hard, although "never impossible".
(c) Current filter log data is insufficient for trouble shooting purpose, because "to identify
    filtered mail from Subjexct:/From:/To:/Message-ID: only" is tough work.
    (c-1) Key value such as MessageKey==Order Received column value)/MessageOffset is needed.

To resolve problem of (b-1)/(c-1), following may be better.
  Split "filtter logging" to 
        (d-1) filter logging by C++ code without localization support.
              Time=...[VTAB]From=...[VTAB]MessageKey=...[VTAB]Subject=... like format is sufficient.
              "Writing in utf-8" only is sufficient for localization support.
    and (d-2) "filter log formatter" or "converter to HTML", by JavaScript, with localization support.
              Because of JavaScript code, Tb can freely format log data pretty easily.
              If "target folder name" and "MessageKey in target folder" is known by log,
              feature of "view filtered mail from log" can be implemented relativelly easily.
If this kind of improvement will be made, resolving (b-2) may be easier than current.
rkent, do you have a preference where this verbose logging should be put? In the NSPR log or the filterlog.html ? We could potentially put a lot of output there, which looks like it would fit in the verbose output we get from the NSPR logs when enabled.
Flags: needinfo?(kent)
NSPR log being https://wiki.mozilla.org/MailNews:Logging with a new module "filters" added.
If you ask me "what is needed to improve filter logging?", then what I would suggest that you work on is a user-visible way to report errors in filter application. A bare NS_ENSURE_SUCCESS in filter application, which we do now, is useless to users except to prevent crashes. One possible user interface for that would be to add an error line in the status bar that could be used to report filtering errors, that is "sticky" in the sense that it remains visible until the user takes some action to dismiss it after an error occurs. That would allow our 10,000,000 daily users to help themselves for issues.

I am not opposed to the NSPR errors though, I just don't think that should be the priority. An NSPR error realistically is targeted to provide input to just a few people on the core Thunderbird team. Our capacity to actually review those logs is of the order of only some tens of logs per year at best. For ExQuilla, I make extensive use of detailed logging such as you are suggesting (though I use an XPCOMified log4moz which introduces performance issues). For a professional product such as ExQuilla, I probably review a log for each 1% of users per year. Scaling that to Thunderbird would require that we review 100,000 logs per year, which is 3 orders of magnitude beyond what is realistic.

On the specific question that you asked, I prefer NSPR over the existing filter log for the kind of detailed progress and error reporting that you are planning. The filter log as we are currently adjusting it in bug 935934 is clearly defined as occurring once per action, after the search succeeds but before the action is applied (successfully or unsuccessfully). This precision makes it useful to the users in diagnosing issues. Cluttering that with progress reports would make it more confusing and less useful to the average user.

I would however also support a simple error message that can be added to the filter log and shows a failure to apply an action. So before NS_ENSURE_SUCCESS in the filter action code call something like LogRuleHit that would report an error in applying the filter. That would be more useful than the NSPR log, but less useful than the visible error I am suggesting.
Flags: needinfo?(kent)
My main concern was to help bug where users think filters were not even run or not on the messages they expected. The new log would log even filter runs that do not match any messages. We would just gain the safety that the filters were run, just the user probably misconfigured the rules.
Think about the bugs where users claim that all filters stop executing after "stop filter execution" action. So this new NSPR log should be useful to us, bug triagers/coders as you correctly assume. And we would request them from the users only when necessary, like we currently do for e.g. POP3 logs.
Adding one line about success/failure to the current user-visible log would be useful and I can try to add it.

I am skeptical on the status line feature as I currently do not know how to make a message stick. We have a bug for showing status bar history, maybe that would help to sme extent. But I would not mix this feature into this bug.
Assignee: nobody → acelists
The issue aceman is one of priorities. I believe that it is really bad to silently be eating errors in filters, and logging that in way that the users can see routinely is a more valuable use of your time (and my time) than adding NSPR logging which is targeted to developers. But if you want to add NSPR logging, go ahead. See also discussions in Bug 1071754.
I'm on the side of doing both. 
NSPR because 
- having filter NSPR log data with other log data is a plus
- it is more efficient? (dunno)
user accessible filter log 
- NSPR can be a bitch for end users

While you're in the code, is doing both much harder than doing one?
For me it is not about priorities but about current abilities. I can't do the proposed status bar thingy. But I think I can do the NSPR part from what I learned at bug 1075149.

(In reply to Wayne Mery (:wsmwk) from comment #8)
> I'm on the side of doing both. 
> NSPR because 
> - having filter NSPR log data with other log data is a plus
> - it is more efficient? (dunno)
Yes, I'll try this.

> user accessible filter log 
> - NSPR can be a bitch for end users
I accepted the idea of at least putting info about success/failure of the applied action to the existing user visible log. I will try to do that too.

> While you're in the code, is doing both much harder than doing one?
I think the two are completely separate so doing both does not add complexity.
I think Firefox has done some new things in the logging area. Unfortunately at the moment I can't remember where. Regardless...

Given that this may require strings, and filters is one of the most beloved of features in Thunderbird (I come across examples almost monthly), would you consider doing something for 52 (even if it is a tiny start) so that we can address some of the issues you mention in comment 0?
Flags: needinfo?(acelists)
I *think* when the filter is run,
its success or failure can be now reported with some information such as
the header lines of the e-mail message, etc.

However, looking at the required info, 
C + There is 1 block of information for each run of filters (with timestamp).
++ There is then 1 block of data for each account where filters are run
   (probably only those that have new msgs or where manual run). The block contains this data:
? ++ How were filters invoked (automatic run, manual, automatic after classification).
? ++ Which account is subject to the filter run (important for merged/global inbox).
? ++ The type of the account (like POP, IMAP, News, Local, RSS).
? ++ How many enabled/disabled filters are being run (which are applicable
   depending on the invocation type above).
? +++ Then, for each folder (or newsgroup) in this account there would be this block of data:
? +++ In case of newsgroup, show the filter count again, as they can be defined per-newsgroup.
? +++ How many messages are subject to these filters (count of new messages for automatic,
    count of all messages if manual).
? +++ List of filter hits in similar format to the existing one (in Filter log) - which
    filter matched, on which message (subject) and what action was taken.
+++ End of block '+++'
++ End of block '++'
+ End of block '+'

(C stands for "Can be done". "?" stands for I have no idea at this moment.)

Aha, I was referring to the information about the message thus filtered,
and I am not sure if the statistics/filter about the filter (especially the summarized information
is available unless such information is stored somewhere and there are accessor functions.)

The filter logging function I am referring to is invoked at the
time an e-mail message is received for filtering (on the delivery of e-mail),
and is invoked from the message delivery code.
So it only records automatic invocation I think (but I could be wrong.)
See Also: → 355902
I think we modified the logging to include better info although if I don't think the change we did cover all the info in my previous comment.
One more thing that would help people finding their filters not working as expected:
As part of "List of filter hits..." the information about *what rule(s)* made the filter select the message, in case that the user selected "Match any of the following". 

As an example: I get a whole lot of insurance SPAM that seems impossible to be detected by a SPAM filter. So I create an own filter with all kinds of suspect words. Once in a while, a Non-SPAM message gets selected by the filter, even though I checked the content of the email word by word against all words in my filter. No match found (also considering parts of the words). So it boils down to recheck the matched messages by hand again. 

With the information, why a message was selected, finding the problem would be easy.
Blocks: 1071628
See Also: → 1135265
See Also: → 230668
See Also: → 174839
Attached patch 697522.patch WIP (obsolete) — Splinter Review

This is a crude version of the NSPR filter logging. To enable you set MOZ_LOG=Filters:5 to watch the Filters module.
Please see if this is going in the right direction. I will also go through the comments here and polish some parts, e.g. comment 13 isn't yet incorporated.

Flags: needinfo?(acelists)
Attachment #9053509 - Flags: feedback?(vseerror)
Attachment #9053509 - Flags: feedback?(mkmelin+mozilla)
Status: NEW → ASSIGNED
Comment on attachment 9053509 [details] [diff] [review]
697522.patch WIP

Review of attachment 9053509 [details] [diff] [review]:
-----------------------------------------------------------------

Great to see this.  

I'm no good at evaluating actual code - Chiaki would be better there than me - so I assume that's not why you asked me for feedback.  I skimmed the locations where MOZ_LOG(FILTERLOGMODULE is added and I see a good distribution of logging levels: Warning, Error, Debug, etc. I did not evaluate against comment 0, you are in a much better to understand if you hit all those points.  I do have these questions:

1. Are there any performance concerns with the added code?

2. Can "normal" filter logging be added to partly cover the last paragraph of Kent's comment 5 where LogLevel::Error occurs (just 9 locations)?  I realize Kent offered it as an alternative to NSPR logging.  But it could be helpful because the  filter log is much easier for users to read and enable than NSPR.

Also I would be happy to give feedback based on a try build.
Attachment #9053509 - Flags: feedback?(vseerror) → feedback+

Aceman,

I think it is in the right direction.

However, isn't this filter logging something that should be logged ALWAYS and possibly checkable in the
error/logging console window?

I may be wrong, but I thought this MOZ_LOG thingy is only visible when we set environmental variable under linux version.
Does it behave differently, say, under Windows?

I would rather see the filtering action in gory detail in the error/log window if I set some preference (boolean setting to true).

Otherwise, it would be rather difficult to analyze filtering mishap/error cases.
(I would probably turn such boolean variable ALWAYS ON).

Just an idea.

Comment on attachment 9053509 [details] [diff] [review]
697522.patch WIP

Review of attachment 9053509 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/search/public/nsMsgFilterCore.idl
@@ +25,5 @@
>    const long All              = Incoming | Manual;
>  };
>  
> +%{C++
> +  void filterTypeName(nsMsgFilterTypeType filterType, nsCString &typeName);

why would you keep this as c++? 
also, seems like an odd signature. I think it should return the name, not set it to the member

::: mailnews/base/search/src/nsMsgFilterList.cpp
@@ +1239,5 @@
> +    } else {
> +      if (filterType & nsMsgFilterType::InboxRule)
> +        typeName.Append("InboxRule&");
> +      if (filterType & nsMsgFilterType::InboxJavaScript)
> +        typeName.Append("InboxJavascript&");

JavaScript

@@ +1248,5 @@
> +      if (filterType & nsMsgFilterType::NewsRule)
> +        typeName.Append("NewsRule&");
> +      if (filterType & nsMsgFilterType::NewsJavaScript)
> +        typeName.Append("NewsJavascript&");
> +    }

and here too

::: mailnews/base/util/nsMsgUtils.cpp
@@ +2064,5 @@
>    NS_ASSERTION(aValue <= PR_UINT32_MAX, "Msg key value too big!");
>    return aValue;
>  }
>  
> +NS_MSG_BASE uint32_t msgKeyToInt(nsMsgKey aMsgKey)

why wouldn't you just print them as unsigned longs. casting could in theory give a wrong an confusing log
Attachment #9053509 - Flags: feedback?(mkmelin+mozilla)

(In reply to Wayne Mery (:wsmwk) from comment #15)

I skimmed the
locations where MOZ_LOG(FILTERLOGMODULE is added and I see a good
distribution of logging levels: Warning, Error, Debug, etc.

Thanks.

  1. Are there any performance concerns with the added code?

There is possibly slight perf penalty because we are fetching some properties of messages and filters for the log messages that we otherwise wouldn't. But it shouldn't be as bad as it is all in the C++ paths. You can test it on the try build, run a filter manually on thousands of messages.

  1. Can "normal" filter logging be added to partly cover the last paragraph
    of Kent's comment 5 where LogLevel::Error occurs (just 9 locations)? I
    realize Kent offered it as an alternative to NSPR logging. But it could be
    helpful because the filter log is much easier for users to read and enable
    than NSPR.

I can try that. You mean the html filter log the user can see, not the Error console.

Also I would be happy to give feedback based on a try build.

There is a try build at
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=199b611a732291cc90096b804c5e140ebc34af04

(In reply to ISHIKAWA, Chiaki from comment #16)

However, isn't this filter logging something that should be logged ALWAYS and possibly checkable in the
error/logging console window?

I think there would be a very high perf penalty if all these messages went into the Error console. Also it is more complicated from C++ to push something to the console. If the instrumented filter code would be in JS, it would be much easier. There are logging facilities for JS that dump into the Error console.

I may be wrong, but I thought this MOZ_LOG thingy is only visible when we set environmental variable under linux version.
Does it behave differently, say, under Windows?

Yes, you must set the env variable on all platforms.

I would rather see the filtering action in gory detail in the error/log window if I set some preference (boolean setting to true).

Otherwise, it would be rather difficult to analyze filtering mishap/error cases.
(I would probably turn such boolean variable ALWAYS ON).

I agree, but I think that is more complicated.

Even with this log as implemented, you can have the env var permanently enabled and log into a file. Then whe you think you saw a filtering error, you close TB immediately and save the log for analysing.

(In reply to Magnus Melin [:mkmelin] from comment #17)

::: mailnews/base/search/public/nsMsgFilterCore.idl

+%{C++

  • void filterTypeName(nsMsgFilterTypeType filterType, nsCString &typeName);
    why would you keep this as c++?

I didn't deem it important enough to put it into the public API of some of the object type. But I can put it into some object if you wish. Maybe into nsIMsgFilter or nsIMsgFilterList.

also, seems like an odd signature. I think it should return the name, not
set it to the member

Ok, I can try.

::: mailnews/base/util/nsMsgUtils.cpp
@@ +2064,5 @@

+NS_MSG_BASE uint32_t msgKeyToInt(nsMsgKey aMsgKey)

why wouldn't you just print them as unsigned longs. casting could in theory
give a wrong an confusing log

We want to treat nsMsgKey as an opaque object where no caller should assume what type it is and not assume it is a number. Only the selected helpers should know how to convert nsMsgKey to int or something else. Similarly we have msgKeyFromInt().
When would the cast give wrong result?

(In reply to :aceman from comment #20)

I didn't deem it important enough to put it into the public API of some of
the object type. But I can put it into some object if you wish. M

Yes that would be better.

We want to treat nsMsgKey as an opaque object where no caller should assume
what type it is and not assume it is a number.

That's probably not too useful. It's not like it could change.

should know how to convert nsMsgKey to int or something else. Similarly we
have msgKeyFromInt().
When would the cast give wrong result?

When you have an int that is large enough and would need to be long

(In reply to Magnus Melin [:mkmelin] from comment #21)

We want to treat nsMsgKey as an opaque object where no caller should assume
what type it is and not assume it is a number.
That's probably not too useful. It's not like it could change.

Actually it can change. We made great effort in the 4GB+ folders to decouple it from the numeric offset of a message in the mbox file. It could also change to 64bit if we need in the future.

should know how to convert nsMsgKey to int or something else. Similarly we
have msgKeyFromInt().
When would the cast give wrong result?
When you have an int that is large enough and would need to be long

If we changed it to long (you mean uint64_t) and made the helper return uint64_t then wouldn't the compiler complain in the prinfs? Or you object to the name of the function containing a generic 'int'?

Attached patch 697522.patch (obsolete) — Splinter Review

I added the output to filterlog.html as per Wayne's suggestion. It needed the move of the file write from nsIMsgFilter to nsIMsgFilterList. The logging is just rudimentary (and the messages aren't localized for now). The idea is that if user notices weird filter behaviour and visits this log (accessible in the UI) and sees any of the simple errors, we can point him to the full logging for debugging.

Attachment #9053509 - Attachment is obsolete: true
Attachment #9058110 - Flags: review?(mkmelin+mozilla)

(In reply to H from comment #13)

One more thing that would help people finding their filters not working as
expected:
As part of "List of filter hits..." the information about what rule(s)
made the filter select the message, in case that the user selected "Match
any of the following".

I looked at this, but the filter rules are sent as a single expression string into the evaluator whether they match a message or not. At a match, it isn't as easy to display which rule was the one that made the hit. This needs to think out some representation and display of the information. That is more complicated to do int his bug.
But it is a useful idea for the log, please file it as a new bug so it doesn't get lost.

Attached patch 697522.patch v1.1 (obsolete) — Splinter Review

Refreshed after the C++ reformatting in mailnews.

Attachment #9058110 - Attachment is obsolete: true
Attachment #9058110 - Flags: review?(mkmelin+mozilla)
Attachment #9060534 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9060534 [details] [diff] [review]
697522.patch v1.1

Review of attachment 9060534 [details] [diff] [review]:
-----------------------------------------------------------------

Looks ok to me, r=mkmelin

::: mailnews/base/search/public/nsIMsgFilterList.idl
@@ +91,5 @@
>      attribute nsIOutputStream logStream;
>      readonly attribute ACString logURL;
>      void clearLog();
>      void flushLogIfNecessary();
> +    // Push a message to the filter log file, adding a timestamp.

Let's use real documentation comments for documentation, so we can get generated documentation going at some point. 

So, either /// or /**  */

::: mailnews/base/search/src/nsMsgFilterList.cpp
@@ +45,5 @@
>    m_temporaryList = false;
>    m_curFilter = nullptr;
> +  m_listId.Assign("List");
> +  m_listId.AppendInt(nextListId++);
> +  MOZ_LOG(FILTERLOGMODULE, LogLevel::Info, ("Creating new filter list %s", m_listId.get()));

Can we make that

Creating anew filter list with id=%s

::: mailnews/base/util/nsMsgDBFolder.cpp
@@ +2581,5 @@
>    junkMailPlugin->GetUserHasClassified(&userHasClassified);
>    if (!userHasClassified)
>      filterForJunk = false;
>  
> +  MOZ_LOG(FILTERLOGMODULE, LogLevel::Info, ("Spam filter will %s run", filterForJunk ? "be" : "not be"));

"be" is in both. In actual usage I find cases like these are better to output as one string and a boolean so it's easier to search and compare logs. I mean like

"Will run spam filter: %s", filterForJunk ? "true" : "false"
Attachment #9060534 - Flags: review?(mkmelin+mozilla) → review+

Thanks, fixed the problems.

Attachment #9060534 - Attachment is obsolete: true
Attachment #9061130 - Flags: review+
Keywords: checkin-needed

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/dfea600bd2b2
add logging of message filter runs and actions. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 68.0

Would we want to put some documentation on this feature somewhere?

I have some time this holiday week if you give some information regarding this feature.

TIA

Flags: needinfo?(acelists)

While we could theoretically have this in release notes, maybe it is a too level feature for advanced users only.
I think bug 864187 would be a much better candidate for the release notes.

Flags: needinfo?(acelists)

But worth mentioning on maildev or other places where triagers and support personnel gather.
And it could be included in a (yet to be created?) general debugging SUMO article.

Blocks: 1547508
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/547dab7a9234
Follow-up: reformat logging code. rs=reformat DONTBUILD
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e51d02c81b86
Follow-up: reformat logging code, take 2. rs=reformat DONTBUILD
Blocks: 1551043
Regressions: 1563959
You need to log in before you can comment on or make changes to this bug.