Apache server stats with small and useful bash script.

Just copy this script to your web-server cgi-bin directory and enjoy.

The script with show the common errors like 404 Error, Internal Server Error and others. It will show the User agent distribution using simple commands like grep, uniq, awk and so on.

You would need to change the tfile – which is temporary file and also the access.log path in the next line.

Just re-direct the output to some file with html extenstion. You could even put this in the cron which re-directs the output to some html in server document root.

sudo grep $(date '+%d/%b/%Y') /var/log/apache2/access.log >$tfile
thits=$(grep -c . $tfile)
echo "Content-type: text/html"
echo ""
echo "Report for $(date '+%d/%b/%Y')

Report for $(hostname) on $(date ‘+%d/%b/%Y’)


echo "Total hits :: $thits"
echo ""
echo "User agent distribution :: "
#awk -F" '{print $6}' $tfile  | sort | uniq -c | sort -fr

awk -F" '{print $6}' $tfile | sed 's/(([^;]+; [^;]+)[^)]*)/(1)/' |sort |uniq -c|sort -fr

echo "
User response code: "
awk 'BEGIN{
a[206]="Partial Content";
a[301]="Moved Permanently";
a[304]="Not Modified";
a[401]="Unauthorised (password required)";
a[404]="Not Found";
a[500]="Internal Server Error";
{print $9 " => "a[$9]""}' $tfile | sort | uniq -c | sort -nr

echo "
404 Error Summary::"
awk '($9 ~ /404/)' $tfile | awk '{print $9,$7}' | sort | uniq -c|sort -nr|head -5

echo "
IP Visit counts :: "
awk '{print $1}' $tfile | grep -o "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" | sort -n | uniq -c | sort -nr|while read count ip
	name=$(echo $ip|/usr/bin/logresolve)
	printf "%5st%-15st%sn" $count $ip $name

# cat $tfile |grep -o "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" | sort -nr | uniq -c | sort -n

echo "
Top Agents :: "
cat $tfile | awk -F" '{print $6}'| sort -n | uniq -c | sort -nr |head -5 

echo "
Top urls ::"
cat $tfile  | awk -F" '{print $2}'| sort -n | uniq -c | sort -nr |head -5

echo -n "
Total Bytes ::: "
cat $tfile | awk '{ sum += $10 } END { if ( sum > 1024*1024) {print sum/(1024*1024)"Mb"}else if ( sum > 1024) {print sum/1024"Kb";}else print sum }'

echo -n "
Total Seconds :: "
cat $tfile  | awk '{ sum += $13 } END { print sum }'
# sed 's/.*GET (.*) HTTP.*/1/g' $tfile|awk -F/ '{if ( NF > 3 ) print $2"/"$3"/"$4; else print $2;}'|sort|uniq -c

Apache Server Stats – Download the script in zip file format.

And here is a sample ::

Report for <server> on <date>

Total hits :: 

User agent distribution :: 

User response code: 

404 Error Summary::

IP Visit counts :: 

Top Agents :: 

Top urls ::

Total Bytes ::: 

Total Seconds ::

Hope that it will be useful.

Enhanced by Zemanta

cvs add files recursively – not already in repository

When you have a lot of files in some repository and you have added a couple of new, in CVS there is no command to add just the new ones to the repository, so here is a workaround for that.

cvs status 2>/dev/null | awk '{if ($1=="?")print "cvs add -kb " $2}'

Well, if you are adding text files then you might want to remove the “-kB” in the cvs command above.


Enhanced by Zemanta

BASH Script Performace

Today we will look at some bash code snippests and the performance issues. Lets first look at the problem and the implemented solution:

Problem: We needed to log the output of the ps command for all the process’s. This was required to be done on per minute basis and the output was required in comma separated files. So, here is what was implemented:

        pslog=`ps -e -opid,ppid,user,nlwp,pmem,vsz,rss,s,time,stime,pri,nice,pcp:u,args|grep -v PID|sort -r -k 13,13`
        logarr=( $pslog )
        for LOGLINE in ${logarr[@]}
                LOGLINE=`echo $LOGLINE|awk '{OFS=",";print $1,$2,$3,$4,$5,$6,$7,:$8,$9,$10,$11,$12,$13,$14}'`
                echo $LOGLINE >> output

This was working well and there were no issues. But suddenly we started seeing issues with the reported CPU usages. We would see that whenever this script was running the CPU usage was high, specially if there were too many process’s/thread’s on the system during that time.  This code was definitely part of a very large code base, and at this point of time we did not know what was causing the issues.

So, a drill started and luckily I was testing this on a Solaris box, so I took some dtrace scripts from DTraceToolkit and started looking at what was happening.  But before this we had already spent quite some time in trying to figure out the issues and fixing a lot of them, to no avail 🙁

After doing some dtracing, the problem was soon evident:

pslog=`ps -e -opid,ppid,user,nlwp,pmem,vsz,rss,s,time,stime,pri,nice,pcp:u,args|grep -v PID|sort -r -k 13,13`
        logarr=( $pslog )
        for LOGLINE in ${logarr[@]}
                LOGLINE=`echo $LOGLINE|awk '{OFS=",";print $1,$2,$3,$4,$5,$6,$7,:$8,$9,$10,$11,$12,$13,$14 }'`
                echo $LOGLINE >> output

Looking at the code again, we store each line of the output in array and then cycle through the same. Do awk on each line to print the same and redirect the same to output file. Now, that looks like a bad design but should not cause drastic CPU usage, but alas that was the case.

Now, we changed the script to :

ps -e -opid,ppid,user,nlwp,pmem,vsz,rss,s,time,stime,pri,nice,pcpu,args|grep -v PID|sed -e 's/ \\{1,\}/,/g;'  >> output

A simple one liner does the job. After making this change I am not relived as we do not see any unexpected CPU peeks 🙂

Enhanced by Zemanta