Closed Bug 276732 Opened 20 years ago Closed 17 years ago

Potential optimization in bayes filter

Categories

(Thunderbird :: General, defect)

defect
Not set
trivial

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: whoelse, Assigned: mscott)

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20041221 Firefox/1.0+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20041221 Firefox/1.0+

In the nsBayesianFilter::classifyMessage function at
http://lxr.mozilla.org/seamonkey/source/mailnews/extensions/bayesian-spam-filter/src/nsBayesianFilter.cpp#930,
it is possible to optimize the occasion that the listener parameter is null.

The filter will automatically classify a message as "good" or "junk" depending
on whether the training file is missing one or the other.  If no "good" tokens
exist, then all email is automatically marked "junk", and vice versa.  The
conditions at lines 946 and 951 of the above file make this determination.  The
problem is that if listener *is* null, then the message contents are scanned
anyway (and discarded at the end with the final non-null listener check), which
could leave room for efficiency.

Granted, it's possible that the listener param is never null, which would make
this bug moot.  In the event this bug is not moot, it might be more efficient to
do the following:

PRBool noGood = (!mGoodCount && !mGoodTokens.countTokens());
PRBool noBad = (!mBadCount && !mBadTokens.countTokens());
PRBool isJunk;

if (noGood || noBad) {
  if (noGood)
  {
    PR_LOG(BayesianFilterLogModule, PR_LOG_ALWAYS, ("no good tokens, assume junk"));
    isJunk = PR_TRUE;
  }
  else
  {
    PR_LOG(BayesianFilterLogModule, PR_LOG_ALWAYS, ("no bad tokens, assume good"));
    isJunk = PR_FALSE;
  }
  if (listener)
    listener->OnMessageClassified(messageURI, isJunk ?
nsMsgJunkStatus(nsIJunkMailPlugin::JUNK) :
nsMsgJunkStatus(nsIJunkMailPlugin::GOOD));
  return;
}

Reproducible: Always
Seems valid enough :-)
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: general
(In reply to comment #0)
> 
> Granted, it's possible that the listener param is never null, which would make
> this bug moot.

There is no valid use of this function with a null listener, as that is the method it uses to return its primary result. The check for null listener is inserted defensively to prevent crashes when the method is misused. I don't see the point of adding additional code to optimize for a condition that should never occur.
Given Kent's comment, it appears the core assumption of the suggestion is flawed, so marking this resolved/invalid.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.