This is the very-very beginnings of a recipe for using the crm114 mailfilter as a spam filter with qmail+vpopmail+qmailadmin.

Use the code and information contained herein at your own risk.

The crm114 code and scripts presented here are the first workable fruits of asking the following question.

Can mailfilter.crm be used in a pop-only single-folder single-account context that permits the MUA (Mail User Agent, e.g. Outlook, Eudora, etc.) to flag spam while allowing the end user to train the filter which is located on the server, on a qmail+vpopmail server without requiring installation of any additional (non-qmail non-vpopmail) local-delivery software such as procmail, maildrop, or safecat?

The answer so far is yes. This strategy has been working apparently well with some caveats on my server since early in April 2005, i.e. for almost a month as of this writing.

The information provided here is intended to help someone wanting to accomplish a similar goal, who could probably have done something similar themself, but who may choose to use this code and information to get a head start. In particular, you should read about the crm114 mailfilter, install it, and make it do a few trivial things for you before you attempt to make any serious use of the information here.

Again, this is recipe for a server-side filtering aided by MUA detection of spam status headers generated by the crm114 "mailfilter.crm" filter. To work correctly, these scripts require that mailfilter.crm be configured in a particular ways. It is possible I may omit some information needed to use this code correctly, and while I intend to correct such omissions eventually, I also want to put up this information quickly, at the possible expense of completeness and accuracy. Also for now I am stating some things descriptively rather than technically because I don't have all the technical information at my fingertips. Again, use this at your own risk.

The code here is small. I am for now not providing a tarball, but rather a bunch of individual sources in a directory.

Known problems:

Other caveats:

CRM114 mailfilter configuration requirements:

Mail server configuration requirement:

Other configuration requirements:

Here is the directory containing all the code I used to enhance my qmail+vpopmail+qmailadmin system to support crm114 "mailfilter.crm"-based spam filtering which can be enabled on a per-user basis using qmailadmin:

view sources here

You may have some problems with some of the source links in the above directory and in the links below if the file extension is not handled well by your browser. Personally I had trouble with the .sh file not being viewable, although my browser downloaded it

This will all not do you much good without some explanation. What follows is the briefest explanation.

The strategy used by this recipe is that the MUA can train the filter to recognize incorrectly classified messages by resending the misclassified mail to the server. A message is trained as spam by resending to a special spam address that the server administrators provides to end users who are permitted to do training. A message is trained as non-spam by resending a message to its final delivery address. Resend specifically refers to a function (often called "Redirect") which resends the entire message with all headers to a specified new destination address. With that background, what follows is an explanation of the purpose of each of the sources.

kdelivermail
This shell script is the outer wrapper for the primary filter to be installed in the mail pipeline wherever filtering functionality is desired. The kdelivermail filter supports classification in the normal case, running the message through the mailfilter.crm in order to add the appropriate X-CRM114 headers, particularly the X-CRM114-Status header, which will contain the text "good" or "spam" so that the MUA can filter messages accordingly. The kdelivermail filter also supports recognition of resend-to-self as a cue to train a message as non-spam instead of doing the default operation of classifying the message. This script calls chkselfresend.crm to determine the appropriate action and mailfilterwrap.sh to do the appropriate kind of filtering.
kspammail
This shell script is the outer wrapper that should be associated with the special spam-training address chosen by the administrator. It should be connected up to the mail pipeline for the spam-training address in the same way that kdelivermail is connected up to ordinary addresses for which filtering is enabled. However, you will not be able to get qmailadmin to automate this for you. Qmail/vpopmail users will have to do it by hand in the .qmail file associated with the special spam address. kspammail calls unresend.crm to filter the incoming message so that it matches the message originally classified as closely as possible. The results of that filtering are passed on directly to mailfilter.crm invoked with the --learnspam option.
chkselfresend.crm
This crm filter is designed to detect the resend-to-self case to trigger the train-as-nonspam action instead of the normal classify action. This filter prepends an internal-use-only line at the front of the message which indicates which of the two actions is appropriate.
mailfilterwrap.sh
This shell script does most of the filtering work of kdelivermail, accepting output from chkselfresend.crm, and passing the message either through a classifying filter or a train-as-nonspam filter accordingly. The initial trigger line from chkselfresend.crm is absorbed. The filtered message is send to the standard output.
movemsgid.crm
This crm filter is used to move the Message-ID header to the end of the headers, for reasons explained above. This is not an ideal thing to do and no doubt does not comply with RFCs. I find it to be relatively harmless for my purposes, and decided to use it on my server in order to achieve better accuracy with less work than would be needed to achieve a more proper and compliant implementation. You may choose to omit this with perhaps only a slight loss of filtering accuracy. Or you may choose another approach.
unresend.crm
This crm filter is intended to undo the combined effect of final local delivery and resending by the MUA. It was tested against vpopmail deliver and (gasp) Outlook Express for Macintosh and so far no other email clients. Between the original act of classifying and the eventual act of re-training a message (when needed) it is desirable that there be no changes to the message text, so that retraining would retrain exactly the same message as was originally classified. Thus the need to undo what happenned between the classify done on the server prior to original delivery and final receipt by the server of a message for retraining, assumed to reach the server via a resend/redirect function of the MUA. What happens after classification is: (1) final delivery processing on the server to the appropriate POP directory, (2) receipt and storage by the MUA, which should involve NO changes, (3) message changes applied by the MUA's resend or redirect command. Thus (1) and (3) must be undone and this is the purpose of this crm script. The message output from this script should match the message that was originally classified, satisfying the single basic requirement for training accuracy.

I put all of the above files in the /usr/bin directory on my server, which is also where crm114 is installed by default. The scripts use hard-coded references to other scripts assuming they are located in /usr/bin. You may need to correct this for use on your server.

If you want to use vpopmail's vdelivermail to avoid requiring procmail, maildrop, or safecat, you will need the following patch to vdelivermail.c, based on vpopmail version 5.4.10. I'm am not a patch guru and there is probably a better way of presenting this. Let me know if there are any problems. I have not yet tested applying this patch to the vpopmail sources!

==============================================================
--- vdelivermail.orig.c Wed Apr 6 05:07:36 2005
+++ vdelivermail.c Wed Apr 6 05:15:06 2005
@@ -52,6 +52,7 @@
struct vqpasswd *vpw;
off_t message_size = 0;
char bounce[AUTH_SIZE];
+char option[AUTH_SIZE];/*kkb*/
int CurrentQuotaSizeFd;

#ifdef QMAIL_EXT
@@ -177,6 +178,11 @@
vexit(111);
}

+ /*kkb begin*/
+ /* get the obsolete/option argument */
+ strncpy(option, argv[1], sizeof(option));
+ /*kkb end*/
+
/* get the last parameter in the .qmail-default file */
strncpy(bounce, argv[2], sizeof(bounce));

@@ -733,6 +739,14 @@
int deliver_err;

chdir(dir);
+
+ /*kkb begin*/
+ if ( strcmp(option, "-d") == 0 ) {
+
+ /* feature disabled, so just return */
+ return(-1);
+ }
+ /*kkb end*/

/* format the file name */
if ( (fs = fopen(".qmail","r")) == NULL ) {
==============================================================

This patch is discussed in the thread "how to do simple vpopmail delivery with filtering" on the "vchkpw" mailing list (vpopmail mailing list). For more information you may want to read this thread.

Good luck with any use you may make of this, and please feel free to send feedback to:

mailfilter DOT recipe AT breathhost DOT net.

Apologies for the preliminary state of this information, bug again, please understand that this information as currently provided in instended only as a help to someone who already knows what they are doing. If this description does not match you, and you choose to try to learn from it, you agree to accept the consequences of your choices. If the description does match you, then since you know what you are doing, then you are of course already responsible for your choices.

-Kurt Bigler, Berkeley, CA 4/29/05