Chaos Engineering – Simulating CPU spike

Share
  • April 15, 2021

In this series of chaos engineering articles, let’s discuss how to simulate CPU consumption to spike up to 100% on a host (or container). CPU consumption will spike up whenever a thread goes on an infinite loop. Here is a sample program from the open-source BuggyApp application, which would cause the CPU to spike up.

SEE ALSO: What is Garbage collection log, Thread dump, Heap dump?

public class CPUSpikeDemo {

  public static void start() {
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    System.out.println("6 threads launched!");
  }
}

public class CPUSpikerThread extends Thread {

  @Override
  public void run() {
		
    while (true) {
			
      // Just looping infinitely
    }
  }
}

In the above Java program, you will notice the ‘CPUSpikeDemo’ class. In this class, 6 threads with the name ‘CPUSpikerThread’ are launched. If you notice the ‘CPUSpikerThread’ class code, there is a ‘while (true)’ loop without any code in it. This condition will cause the thread to go on an infinite loop. Since 6 threads are executing this code, all the 6 threads will go on an infinite loop. When this program is executed, CPU consumption will skyrocket on the machine.

We launched the above BuggyApp program on a ‘t3a.medium’ EC2 instance, which has 2 CPUs. Below is the output from the UNIX performance monitoring tool ‘top’. You can notice the overall CPU % reaching out to 100%.

Fig: Top tool showing CPU consumption spiking up to 100%

How to diagnose CPU spike

As highlighted in this article, you can use manual approach to do root cause analysis:

  1. Capture thread dump from the application
  2. Capture ‘top -H -p {PID}’ output
  3. Marry these #a and #b and identify the root cause of the CPU spike problem

On the other hand, you can use automated root cause analysis tool like yCrash – which automatically captures application-level data (thread dump, heap dump, Garbage Collection log), system-level data (netstat, vmstat, iostat, top, top -H, dmesg,…) and marries these two datasets to generate instant root cause analysis report instantly. Below is the report generated by the yCrash tool when the above sample program is executed:

Fig: yCrash tool point out lines of code causing the CPU spike

SEE ALSO: What are the process states in Unix/Linux?

From the report, you can observe the yCrash is pointing out that 6 threads are causing the CPU to spike up. In the ‘CPU | Memory’ section of this report, you can notice that CPU consumption of each thread (which is > 30%) to be reported. You can also notice that tool is pointing out exact lines of code i.e., com.buggyapp.cpuspike.CPUSpikerThread.run(CPUSpikerThread.java:12) that is causing the infinite loop. Equipped with this information one can easily go ahead and fix the problematic code.

The post Chaos Engineering – Simulating CPU spike appeared first on JAXenter.

Source : JAXenter