
Otherwise, when the write becomes synchronous, it would involve a waiting time.Īdditionally to the dirty_background_bytes kernel parameter, we can also set: The goal is to have the page cache to allow the OS to write asynchronously to disk whenever possible.

To solve this problem, we have to increase the ‘ dirty_background_bytes‘ kernel setting to higher values to be able to accommodate the throughput.Īs a base formula, we usually consider a value of dirty_background_bytes=10MB for 40MB/Sec throughput. Even though the framework will retry, on Hadoop/YARN environments, this will impact in performance, and, might also lead to application failures. This is also very common on clustered environments. 64+ vCores), given that the volume of IO requests could be higher, and, the kernel buffer queues not configured for such load. This indicates that the process requested a block device such as a disk/swap, and wasn’t able to be fulfilled for more than 120 seconds and subsequently abandoned by the kernel.Īs mentioned before, the probability of observing this behavior will increase when we use instances with large number of vCores (e.g. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. It is pretty common to start seeing these messages on the ‘dmesg’ kernel output: INFO: task kswapd1:1140 blocked for more than 120 seconds. When we run very stressful jobs running on large servers (large number of CPU’s and RAM memory), where IO activity is very high. But, if you concern about performance on Linux servers, at some point, you will have to have a look to the kernel messages.


This might be old school, and maybe even boring reading.
