Apache server stats with small and useful bash script.

Just copy this script to your web-server cgi-bin directory and enjoy.

The script with show the common errors like 404 Error, Internal Server Error and others. It will show the User agent distribution using simple commands like grep, uniq, awk and so on.

You would need to change the tfile – which is temporary file and also the access.log path in the next line.

Just re-direct the output to some file with html extenstion. You could even put this in the cron which re-directs the output to some html in server document root.

sudo grep $(date '+%d/%b/%Y') /var/log/apache2/access.log >$tfile
thits=$(grep -c . $tfile)
echo "Content-type: text/html"
echo ""
echo "Report for $(date '+%d/%b/%Y')

Report for $(hostname) on $(date ‘+%d/%b/%Y’)


echo "Total hits :: $thits"
echo ""
echo "User agent distribution :: "
#awk -F" '{print $6}' $tfile  | sort | uniq -c | sort -fr

awk -F" '{print $6}' $tfile | sed 's/(([^;]+; [^;]+)[^)]*)/(1)/' |sort |uniq -c|sort -fr

echo "
User response code: "
awk 'BEGIN{
a[206]="Partial Content";
a[301]="Moved Permanently";
a[304]="Not Modified";
a[401]="Unauthorised (password required)";
a[404]="Not Found";
a[500]="Internal Server Error";
{print $9 " => "a[$9]""}' $tfile | sort | uniq -c | sort -nr

echo "
404 Error Summary::"
awk '($9 ~ /404/)' $tfile | awk '{print $9,$7}' | sort | uniq -c|sort -nr|head -5

echo "
IP Visit counts :: "
awk '{print $1}' $tfile | grep -o "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" | sort -n | uniq -c | sort -nr|while read count ip
	name=$(echo $ip|/usr/bin/logresolve)
	printf "%5st%-15st%sn" $count $ip $name

# cat $tfile |grep -o "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}" | sort -nr | uniq -c | sort -n

echo "
Top Agents :: "
cat $tfile | awk -F" '{print $6}'| sort -n | uniq -c | sort -nr |head -5 

echo "
Top urls ::"
cat $tfile  | awk -F" '{print $2}'| sort -n | uniq -c | sort -nr |head -5

echo -n "
Total Bytes ::: "
cat $tfile | awk '{ sum += $10 } END { if ( sum > 1024*1024) {print sum/(1024*1024)"Mb"}else if ( sum > 1024) {print sum/1024"Kb";}else print sum }'

echo -n "
Total Seconds :: "
cat $tfile  | awk '{ sum += $13 } END { print sum }'
# sed 's/.*GET (.*) HTTP.*/1/g' $tfile|awk -F/ '{if ( NF > 3 ) print $2"/"$3"/"$4; else print $2;}'|sort|uniq -c

Apache Server Stats – Download the script in zip file format.

And here is a sample ::

Report for <server> on <date>

Total hits :: 

User agent distribution :: 

User response code: 

404 Error Summary::

IP Visit counts :: 

Top Agents :: 

Top urls ::

Total Bytes ::: 

Total Seconds ::

Hope that it will be useful.

Enhanced by Zemanta

Ranking of the most frequently used commands

Lets take a quick look at how to get the most frequently used commands on you shell. So what we need to do is this:

history | awk '{print $2}' | awk 'BEGIN {FS="|"}{print $1}' | sort | uniq -c | sort -n | tail | sort -nr

So, how did we arrive at this and will this always work? No it might not always work. A typical example is where HISTTIMEFORMAT variable is set. In that case, if you check history, you will see that after the number column we have time and date in the specified format, in which case, you will get wrong information from the above command. Anyways, forgetting these special cases, lets go to how we got this command:

history|awk '{print $2}'

will give us list of all the commands that we have used and are in history. But this will also give commands like “history|more” as one command as this does not have any space. So, we eliminate whatever is there after the “|” with awk command

history|awk '{print $2}'|awk -F"|" '{print $1}'


history | awk '{print $2}' | awk 'BEGIN {FS="|"}{print $1}'

and now to get all the counts, we need to sort and then count unique occurances:

history | awk '{print $2}' | awk 'BEGIN {FS="|"}{print $1}'|sort|uniq -c

Time for pretty display and make the display more readable to users:

add sort -n to sort with first column treated as number and then tail to display only few lines and then sort -nr to display in reverse order so that the top entry is the most used one 🙂

Enhanced by Zemanta

find duplicate entry in a list in bash with sed

Here I will take an example of rss2email list, but I guess I will be able to pass on the concept.

Here is example of the output of the r2e list command:

1: http://blog.amit-agarwal.co.in/feed (default: amitag@localhost)
2: http://feeds2.feedburner.com/AllAboutLinux (default: amitag@localhost)
3: http://feeds2.feedburner.com/Command-line-fu (default: amitag@localhost)
4: http://blogs.members.freewebs.com/Members/Blogs/viewBlogRSS.jsp?userid=29731143 (default: amitag@localhost)

Target here is to get the list of all duplicate entries if any. So, first we need to remove the numbers from the begining and the email ID from the end.

We will use sed to remove the email and the numbers. Heres what we can use for doing this

sed \’s/^[0-9]*: //\’


sed \’s/ (.*//\’

So, let\’s try now with

r2e list |sed \’s/^[0-9]*: //\’ |sed \’s/ (.*//\’

If you see just the lines we are interested in then it is time to use the uniq command.

r2e list |sed \’s/^[0-9]*: //\’ |sed \’s/ (.*//\’ |uniq -d