2004-12-29

Spam prevention process

A quick outline of the process I'll be using to prevent comment spam.

  1. Hidden field hashing as described in "Comment spam prevention". This also forces a preview, which may fool spambots, and allows you to check the comments (for such things as invalid HTML, too many links) without having to perform all the following for the preview.
  2. Check the IP address against Spamhaus, DSBL, and any other RBLs that the user specifies. If one matches, block the IP for a short period and reject the comment.
  3. Find URIs.
  4. Check URIs against a blacklist, such as MT-Blacklist, or a personal blacklist such as Simon Willison's blacklist. If one matches, block the IP for a short period and reject the comment.
  5. Check URIs against SURBL. If one matches, block the IP for a short period and reject the comment.
  6. Run the comment through a Bayesian filter. If the match to other spam comments is high, block the IP for a short period and reject the comment. If the match is unsure, move the comment to the moderation queue.
  7. Optionally, for the paranoid:
    1. Follow links in the post (following all redirects) and check all the links on the resultant page.
    2. Force all comments to join the moderation queue, unless the user accepts an email verification or is authenticated through other means (TypeKey, site-specific registration).

If the comment passes all these tests it is probably not spam. Any transformations can be performed and the comment stored in the database.

0 Comments:

Post a Comment

<< Home