Bye bye Blogspot...
From now on I'll be posting at porges.name. That is all.
Spent the afternoon coding the algorithms from SpamBayes into PHP. Hurrah for open-source!
A quick outline of the process I'll be using to prevent comment spam.
If the comment passes all these tests it is probably not spam. Any transformations can be performed and the comment stored in the database.
For the rule "RewriteRule ^blog(.*)$ escrib/Escrib.php?arg=$1":
Argh. And according to the mailing list, there is no way around this.
So, what happens when a user wants to use a percentage sign in their titles?
Phew. But:
Suggestions on what I should do? I think this would break even further if I start trying out some UTF-8 tests. (Yes, still a long way until IRIs will be supported!) I'd hate to only allow ASCII in permalinks... although if I can't guarantee that it won't break I may have to do so. Either that or only allow titles that are the same when encoded and double-decoded.
Update: D'oh! I'm a stupid-head. $_SERVER['REQUEST_URI']. Still, I'm leaving this here so others can learn from it :)
"イメージプレス" (from Standing Tall) transcribes to "imējipuresu", which I believe means "Imagepress".
Similarly, "カテゴリー" transcribes to "kategorī" ("categories"), "リンク" to "rinku" ("links"), and "アーカイブ" to "ākaibu" ("archives").
Of course, the macrons would need to be stripped as well. I'm still looking for a reference for kanji characters. Pointers would be appreciated!