perl is faster than bash in some cases.

2011-01-11 3 min read bash Fedora Linux perl

Some days back, I had to generate some data to be uploaded to a database. As usual I assumed that bash should be faster and hence wrote the script to create the files in bash. But I found that even after 5 hours I was only 10% done with the data generation. Now that would mean that it would take around 50 hours to complete the data generation. Something did not look correct to me and I asked one of my colleague. He suggested I do a strace.

A quick strace command on the PID was shocking but very clear on what was happening.

  <td>
    <div class="text codecolorer">
      strace  -p <PID>
    </div>
  </td>
</tr>
1

Here’s a explanation of what was happening:

We saw that for every write there was

write(1, “a\n”, 2)                      = 2
dup2(10, 1)                             = 1
fcntl64(10, F_GETFD)                    = 0x1 (flags FD_CLOEXEC)
close(10)                               = 0

We knew that these are very costly calls for CPU and immediately understood what we should do. What was actually happening was that for each of the echo command the FD was being opened, file appended and then FD closed. This made it very clear why the script was running so slow. So, I quickly did some test to very that this will fix the issue I was facing.

I wrote one bash and one perl script to test this and did the time on these. Here are the programs and the output of time on them.

  <td>
    <div class="text codecolorer">
      echo a > test<br /> echo a > test<br /> echo a > test<br /> echo a > test<br /> echo a > test
    </div>
  </td>
</tr>
1
2
3
4
5

time output:

real    0m0.020s
user    0m0.004s
sys    0m0.005s

  <td>
    <div class="text codecolorer">
      open FILE, ">test";<br /> print FILE "test";<br /> print FILE "test";<br /> print FILE "test";<br /> print FILE "test";<br /> print FILE "test";<br /> close FILE;
    </div>
  </td>
</tr>
1
2
3
4
5
6
7

time output:

real    0m0.035s
user    0m0.001s
sys    0m0.008s

one more test to confirm the result

  <td>
    <div class="text codecolorer">
      echo a >> test<br /> echo a >> test<br /> echo a >> test<br /> echo a >> test<br /> echo a >> test
    </div>
  </td>
</tr>
1
2
3
4
5

time output:

real    0m0.018s
user    0m0.006s
sys    0m0.003s

As you can see the perl script took a lot lesser user time on the CPU and that is because the file was opened only once and then once all the output was written to the file, the file was closed so file operations in perl are much less than that in the similar bash script. The time taken in the bash script can be decreased drastically if we use open in the bash script also. So, the lesson that I learned was if there are some operations that you can remove from your script, even if they do not seem to be serious issue in the begining, you can improve the performance greatly.

Enhanced by Zemanta
comments powered by Disqus