r8 - 2013-05-16 - 21:31:21 - Main.harryabadiYou are here: TWiki >

Deployment Web > DeploymentTroubleshooting > PerformanceTroubleshooting > WhyIsMyCPUSpiking > RedHatCpuSpike

Red Hat CPU spike

Authors: MichaelAfshar
Build basis: Redhat Linux 5.6 or later

Page contents

Data to be collected

This page describes the tools and procedure to use to identify the root cause of a CPU spike in a RedHat Linux environment.

Data to be collected

system overview
vmstat data with a 1 second granularity, timestamped
top output with a 2 or 3 second granularity, also timestamped
top -H output samples during the CPU spike
javacores collected with a (say) 30 second granularity during the CPU spike
snapshots of the "Active Services" running at the time of the CPU spike

System overview

We start by trying to build up a picture of the CPU resources available to the system. The kinds of questions we may have are:

How many (virtual) processors are configured (or assigned to the LPAR)?
What SMT level is enabled?
If virtual, are the number of processors available to this LPAR capped, or uncapped? What is the entitlement for this Virtual Machine?
Depending on the entitlement, and if uncapped, how large is the pool, and how many other LPARs share processors in this pool, how overcommitted is the pool of processors.

Running the RedHat Linux "xxxxx" command answers most of the questions we may have.

% xxxxxxxx

We refer the reader to the xxxxxx man page for field by field descriptions of this output.

vmstat

To locate the time of the CPU spike, and to gain an insight into whether user or system CPU is being consumed at this time, the best tool to use is vmstat. The preferred flags to use for this are:

% nohup vmstat -a 1 <iterations>   > vmstat.out &

This command runs vmstat in the background (and wrapped with a nohup command) and redirects the output to a vmstat.out file. It will collect data with a 1 second granularity. Maybe a value of 604800, which collects data for a week, is appropriate for iterations. [If so, the output file could be quite large]. If the mean time between CPU peaks, is small (say daily) then a smaller value for may be more practical.

Please note that this level of granularity is necessary for a comprehensive insight into the consumption of this resource. A system administrator may well be running other tools that are also collecting data (sar) with a larger (e.g. 5 minute or 15 minute) granularity - but this output, though useful for capacity planning purposes, will not be helpful in the analysis of possibly short-lived CPU spikes.

Check the vmstat output to identify the time of the CPU spike, its duration, its amplitude, and whether, when CPU consumption is high, whether this CPU is in kernel, user or wait I/O? Check also for excessive paging.

top output

Log top output to a file, for example with this command:

 % nohup top -b -n <iterations> > top.out &

which will collect "iterations" samples to an output file, top.out. Assuming a 3-second delay time, maybe a value of 201600, which collects data for a week, is appropriate for iterations. [If so, the output file could be quite large]. If the mean time between CPU peaks, is small (say daily) then a smaller value for may be more practical.

Analyze the top output to check if there one process consuming a lot of CPU, or several, or many? If it is just the java process that is consuming memory, see the next section. If processes other than java are responsible for the CPU peak, ascertain if these really need to run on this system.

top output for threads

Follow the directions in RedHat Java thread CPU monitoring to collect per thread data for the Java threads.

javacores

Run the waittool to collect javacores at the time of the CPU spike with a 30 second frequency. Look at the stack traces of the java threads that (in the above output) are responsible for most of the CPU consumption.

External links:

None

Additional contributors: None

Deployment

Community information and contribution guidelines

Status icon key:

To do
Under construction
New
Updated
Constant change
None - stable page

Smaller versions of status icons for inline text:

Copyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.