The server writes logs that are too big to read without the aid of ad-hoc computer tools. But pretty neat analysis can be done on a single command line.
I often notice that the server is struggling when I can't send email. I'll check and find it running at a load average of 50 or above. I'll shut down apache to regain control, deny further access to the offender, then try to figure out what they were doing.
I noticed that one user was running cgi/test 10 or 20 times a second. What's with this? The tiny cgi just reports query parameters that don't change much from call to call?
I wondered, how long and how hard has this user been pounding my machine?
I grep lines logging the script; extract the time; count the times; and format the counts as a graph.
grep cgi/test /var/log/httpd/access_log | perl -ne '/2012:(\d\d:\d)/; print "${1}0\n"' | sort | uniq -c | perl -pe '/\d+/;$x="|"x($&/1000);s/$/\t$x/'
Here I've chosen to look and ten minute buckets by matching hh:m. I scale the graph accordingly.
4 04:30 1 04:40 7835 04:50 ||||||| 12191 05:00 |||||||||||| 12460 05:10 |||||||||||| 13059 05:20 ||||||||||||| 13130 05:30 ||||||||||||| 12932 05:40 |||||||||||| 13410 05:50 ||||||||||||| 13491 06:00 ||||||||||||| 9374 06:10 ||||||||| 27600 06:20 ||||||||||||||||||||||||||| 35685 06:30 ||||||||||||||||||||||||||||||||||| 35639 06:40 ||||||||||||||||||||||||||||||||||| 19043 06:50 |||||||||||||||||||
The traffic stopped at 06:55 when I denied access to the server. The server still recorded denied accesses so I could watch the request rate drop off second by second over three minutes.
Running variations of this program I could see that the request rate dropped off linearly. Could the user have spawned a mass of jobs that kept fetching cgi/test in a loop? Could I have been watching these jobs die one by one?
The launch at 04:50 looked human. The escalation at 6:20 looked human. But the controlled relief at 6:55 definitely looked like a computer's response.