cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

CPU I/O wait

Babar_Qayyum
DynaMight Guru
DynaMight Guru

Dear All,

What exactly is the CPU I/O wait? Particularly with the reference to Kubernetes Master/Worker nodes? 

Babar_Qayyum_0-1654679099608.png

Regards,

Babar

7 REPLIES 7

AntonioSousa
DynaMight Guru
DynaMight Guru

@Babar_Qayyum,

I/O wait means that your processor is stalled because it is waiting for disk, and can't do anything else in the processes it is scheduling. It might not mean that it cannot do other things, but in the case of your graph, it seems that the processes that are running (one or more) are waiting for I/O.

It might be difficult to find out what exactly is going on, probably correlate with the logs? Or better, check disk activity by process in Dynatrace, you might get a fast clue there...

Finally, in normal Linux, you might try to put the disk in debug, but not sure if it can be done the way I usually do it in Linux...

Antonio Sousa

Hello @AntonioSousa 

I looked into the disk latency, especially for the Disk read, and there is a latency, but the CPU I/O wait started a couple of minutes before this latency. How do you see this?

 

Babar_Qayyum_0-1654689228884.png

Regards,

Babar

AntonioSousa
DynaMight Guru
DynaMight Guru

@Babar_Qayyum,

In the server that had issues, in the old host view, click "Consuming processes", then the separator "I/O" and you should be able to figure which process did the most I/O.

Antonio Sousa

Hello @AntonioSousa 

There is no I/O during the problem except the maximum CPU used by Other processes.

Babar_Qayyum_0-1654698726460.png

Babar_Qayyum_0-1654754009462.png

Regards,

Babar

OK, so it seems that the "Other processes" are grabbing the I/O. Does this happen often?

Antonio Sousa

Hello @AntonioSousa 

No. It recently happened with one of the OpenShift clusters.

Regards,

Babar

@Babar_Qayyum,

You could try to put the filesystem in debug mode, to find out who is accessing the filesystem, but that of course generates massive amount of data. Given that your are not able to replicate the issue, that would make it more difficult. But seems the disk usage is low, so it might be possible?

Antonio Sousa