Dive Into Mark, a great blog whose name inevitably conjures slightly disturbing imagery when read the first few times, has a cool article on how Mark used Apache and the mod_rewrite module to stop evil spambots, spybots, and unwanted robots — by definition, those that don’t respect the Robot Exclusion Standard — from stealing his bandwidth, content and email addresses.
Some will say that the Internet is a public place, and if I don’t want something abused, I shouldn’t put it on the Internet. Well, that’s true. It is also true that if I don’t want to get mugged, I shouldn’t leave my house, and if I don’t want calls from telemarketers, I shouldn’t have a phone. But I like leaving my house, I like having a phone, and I like having this web site.
Amazingly, a particularly evil — if not malicious, then programmed by friggin’ idiots — “plagiarism prevention system” called Turnitin hit his site nearly 20,000 times in January, using up over a gigabyte of transfer bandwidth in the process.