Thursday, April 24, 2014

How to determine if a program is thrashing in Linux?

When a program is thrashing, it runs very slow because there was not enough physical memory on the system. Consequently, a lot of reading and writing to disk are happening to swap data in and out of memory. For example, if program needs to load new data into memory in order to run but there is no memory available, the OS would have to take something in memory and write it to disk to free up the memory needed for the program to run further. This is a swap out. When the program needs to access to the old data on disk, the data will have to be copy back to the memory. This is a swap in.
If there is a lot of swap in and out, it indicates the program is thrashing.

When a program is running very slow, and you want to know if it is because of thrashing.
Let's assuming that it is running slow not because it is calculating something complicated that takes a lot of CPU cycle.
Here are the steps you can check to tell if it is thrashing:

1) The first thing you need to look is to check how much memory is available when the program is running.

>free    // running free will tell you how much memory is available. You won't see the available memory equal to zero because the OS would not allow that to happen since it needs some free memory to operate properly.

~# free -m  //-m display in MB
             total       used       free     shared    buffers     cached
Mem:          5800       5774         25          0          7        609
-/+ buffers/cache:       5157        642
Swap:        10047       5047       4999


If there is not a lot of memory left, it does not always mean the program is thrashing. But it is a good indication that a lot of pages are being swapped in (si) and out (so).

2) To see if there is a lot of swappings happening, run vmstat while the program is running.

# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0 2802448  25856 731076    0    0    99    14  365  478  7  3 88  3
 0  0      0 2820556  25868 713388    0    0     0     9  675  906  2  2 96  0
 0  0      0 2820736  25868 713388    0    0     0     0  675  925  3  1 96  0
 2  0      0 2820388  25868 713548    0    0     0     2  671  901  3  1 96  0
 0  0      0 2820668  25868 713320    0    0     0     0  681  920  2  1 96  0

Here you see zero value for si and so, but if these columns have a high value, it indicates the program is thrashing.

No comments: