skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS

Spamassassin

Out of the box, SpamAssassin does a reasonably good job of tagging spam, however, you can make it even better by enabling bayesian filtering. Unfortunately, the increased effectiveness doesn't come for free, you will need to actively train SpamAssassin's bayesian filter by telling it about spam messages it missed.

Enabling Bayesian Filtering

By default, we have disabled SpamAssassin's bayesian filtering, as it significantly increases the scan time of messages and, if a user is not actively training the bayesian filters, it's of no use. To enable bayesian filtering, add the following lines to ~/.spamassassin/user_prefs
   # turn on bayesian filtering
   use_bayes	1

Training the Bayesian Filter

Initially, the SpamAssassin bayesian filter doesn't know anything about spam -- you have to train it.

Bulk Spam Training

If you have a mailbox file full of spam, spamassassin can train it's bayesian filter by examining the email in this mailbox. You can run
  sa-learn --spam --mbox <path to spam mailbox>
You can safely run this command repeatedly on the same mailbox and it will only update the bayesian filter with new spam messages.

Bayesian Filtering in Combination with GUI Based Mail Clients

The bulk spam training method can be effectively used in conjuction with GUI based mail clients, such as: Outlook, Mozilla, and Firefox. The GUI mail client should be configured to put spam messages into a folder on the IMAP server. Typically this is either done automatically by client spam filters available with all of the listed clients or by hand. Once a week, you can run sa-learn in the bulk update mode against this spam folder on the server to update the server side bayesian filter.

Mutt

You can set up some Mutt keybindings so that hitting the 'X' key when you're not reading your spambox, deletes the message, and updates the spamassassin bayesian filter marking this message as spam. Hitting 'X' while reading your spambox, let's spamassassin know that the message shouldn't have been marked as spam. (You'll probably want to save this message someplace other than your spambox).

The lines you need to include in your .muttrc file to bind the 'X' key are here. After updating your ~/.muttrc file, you're ready to go. Just use "mutt" to read your mail as always. When you get a spam message, delete it with the 'X' key (instead of the usual 'd' key). The more spam you delete with the 'X' key, the better spamassassin will get at recognizing spam.

Occasionally, you should read your spambox (start mutt like this: mutt -f /var/mail/$LOGNAME.spam). If you find a message that should not have been considered spam, hit the 'X' key. (You'll probably want to move it out of your spambox as well).

Pine

To use Spamassassin's bayesian filter from Pine, you'll have to change your Pine settings to enable Unix pipe commands. Here how this is done:
  1. Start pine
  2. Type 'S' (Setup)
  3. Type 'C' (Config)
  4. Use the down-arrow key to highlight 'enable-unix-pipe-cmd' (which is in a list under Advanced Command Preferences).
  5. If necessary, "set" this preference by typing 'X'. (There should then be an X in the box to the left of the preference.)
  6. Type 'E' (Exit Setup)
  7. Type 'Y' (Save Changes)

When spam gets into your non-spam mailbox (a "false negative"), you will want to adjust spamassassin's database to recognize like messages as spam:

  1. highlight the message in the message index
  2. press the '|' (vertical bar) key.
  3. type the command 'sa-learn --spam' and press return.

As with Mutt, you will occasionally need to read your spambox. Start pine like this:

pine -f /var/mail/$LOGNAME.spam
When you find a message in your spambox that is not spam (a "false positive"), you will want to adjust spamassassin's database to recognize like messages as non-spam in the future:
  1. highlight the message in the message index
  2. press the '|' (vertical bar) key.
  3. type the command 'sa-learn --ham' and press return.

Page Owner: Mark Dieterich Last Modified: Thu Jun 9 10:51:03 2005