This cron script would sample the system load average every minute. When load reached levels that would block my own access to the server the script then looked through the server logs for someone scraping wiki. If found, they were denied using the blunt instrument of .htaccess.
I recognized that it was a limitation of my implementation that I couldn't serve pages as fast as some robot wanted them. I wasn't eager to beef up my logic. Rather I constructed an elaborate mechanism for making and distributing pre-rendered pages through a network of mirrors.
Wiki normally generates pages on demand. However, some select pages are converted to html in a weekly batch process. These pages can be quickly served to readers seeking an overview of popular wiki topics.
As the net got faster there was little value caching content overseas. The ultimate solution for protecting my server was to tweak some apache configuration parameters that simply made it the bottle neck when I was being scraped.