Bash script performance issue and remediation
2024-06-17
921 words
5 mins read
I was looking at writing some output to files. The output did not vary much but the number of lines was huge. It was long time back so I dont remember the context of the requirement but I needed to write few million lines of text. The quickest way to do this was bash so I wrote a simple script that could do this. To my astonishment, I could see the script much more time than expected, so here is how I debugged the issue and fixed it.
Here I have taken a fictitious program to demonstrate the issue and am writing only 500K lines. On my laptop with SSD, here is how much time it takes:
|
|
When testing for performance, always run the test at least 3 times to see if there is variation in performance or time taken. As you can see above the script is consistently taking 7.1 to 7.2 seconds. I would have expected a little better results :)
So, lets first check the script ( I used ChatGPT to create this script) and here is the script.
|
|
Pretty simple and as expected, so what is wrong. How can we improve this.
Lets first try to see what is wrong. I will run the script with strace. If you dont have it installed, you can install it with dnf install strace
and then run the script with strace and check the output.
Initially you will see some libraries being opened and loaded and some stuff being done for mapping the process to the memory. We are not interested in those, we want to check what the process is doing, so I will skip some of the messages from the top.
|
|
We can see that in the output we see the same pattern repeated, well its expected. However, what is problematic is that the file is opened everytime we are writing to it. On retrospect, this is expected but think about how much time we are wasting because of this. Lets try to optimize this.
So, again, I know that we want to use exec
to open a file descriptor and use that to write to file. This will help us avoid doing this. Asked ChatGPT to do this optimization and here is updated script.
|
|
Let’s try this.
|
|
This is only taking 50% (about 4.5-4.9 seconds) time compared to first one. This is good improvement. It’s only 500K iterations with small line. Think about doing 50M iterations, that will definately help.
Let’s check this in strace.
After the usual loading of libraries and other usual stuff, we can see the file is opened
|
|
After the file is opened, we can only see the following
|
|
And you can see there is no more open/close of file for every write.
Hope this helps.
Related Articles:
- 2020/04/20 scripting – performance improvement with file open
- 2017/03/06 Linux Best Practices and Tips
- 2016/08/16 change the output format for time command
- 2016/07/10 Get your local IP address like pro
- 2015/11/30 Disk usage by file type
Authored By Amit Agarwal
Amit Agarwal, Linux and Photography are my hobbies.Creative Commons Attribution 4.0 International License.