~~EditAttach~~Printable

r3 - 2015-02-06 - 17:20:20 - Main.d3v3r1ttYou are here: TWiki >

Deployment Web > PerformanceTroubleshootingFutureTopics > RTCTroubleshootingBasicsandDataCollection

RTC Performance Troubleshooting Basics and Data Collection

Author: Isabel Murakami
Build basis: Rational Team Concert

When an user says, "The system is too slow" or "CLM is down", what do they really mean? How do you quantify the user perception of slow? Maybe the user was expecting, for example, that a plan with 500 Work Items would be displayed with nearly the same amount of time it takes to open a small one with 50 Work Items. Or maybe, just part of a network was down, affecting all users from a single region.

The only way to understand what is working, what is not working, and how long it is taking, is by collecting all the data needed to measure how much each kind of access normally takes under normal usage and, also, under high usage. Therefore, when the user says, "It is slow", you can compare and confirm (or not) how much it is slow.

Here is a guide for the Collaborative Lifecycle Management (CLM) Administrator of the main points to understand about your CLM install and usage, as well as the tools that can be used to collect data before, during, and after the slowness period.

Performance Troubleshooting Basics

Keys to Understanding Performance Issues

Performance is relative to the user – We need to know the expected time under normal conditions and the time when experiencing slowness
The business need of the tool – What is used more? SCM, Reports, planning...
Performance issues can occur because of many factors

Where is the performance degradation occurring?
1. Only one user or all users? for example, Certain users located in a certain building
2. Is the performance degradation across the whole server?
When is the performance degradation occurring? for example, All the time, under light load, under heavy load etc
What exactly is slow? SCM, load plans, builds, Reports, Dashboards? Or only certain components? for example, Slow loading plans, but work items are fine
Are there any hangs, crashes, Out-of-Memory (OOM), high CPU?

Server versus Client side

When answering the question "where", normally we can detect if the slowness has been caused by a Server issue (all users affected) or a Client side issue (some users affected). For example, all users connecting via VPN are being affected (the others are working fine), or this user is not being able to access the application when using Visual Studio (Configuration problem on a single machine).

Characterizing the Performance Problem

The Duration (How long is it taking?)

Collect information on several data points, taking note of the exact date and time, as well as the amount of time it takes for the execution. Why? To be able to compare the expected time versus the actual time during the slowness, and understand what is slow.

Prepare a #Test using tasks related to the slowest performance activity for the user (build, a specific query, saving Work Item (WI), etc). Always use the same tasks. For example:
- Choose a big Project Area (PA)
- Choose a Plan view with several WIs
- Choose a word for a quick search
- Prepare a Query to run, for instance – Open WIs on a selected PA
- Check-in & deliver of a sample Project Area, for instance.
- Sample Build

Performance issues at the Server and Client side:

Gather a screenshot of the following page from the web UI. Ensure you gather a screenshot for each application (example: jts, ccm, jazz, rqm):

https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.activeServices
https://<server>:<port>/<jts|ccm|qm>/service/com.ibm.team.repository.service.internal.counters.ICounterContentService
https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverStatus
https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverDiagnostics
https://<server>:<port>/<jts|ccm|qm>/service/admin?internal=true#action=com.ibm.team.repository.admin.statistics

How many users are on the system on the moment. It is difficult to answer this specific question, however using PMI monitoring on WebSphere, we can see how many requests on WebSphere are currently being processed.

Client side

Metronome output and test performance during the #Test Steps execution
Collect HTTPWatch output during the #Test steps execution

Understanding the Environment

Operating system of the user and server (RHEL, SUSE, Windows, etc)
Database vendor (MS SQL, Oracle, DB2)
Application Server (Tomcat or WebSphere)
Client type (web or eclipse) and Client version, Web Browser (Firefox, Internet Explorer, Chrome) and browser version
System Specifications: RAM, Heap Size, CPU, Virtualized or Physical machine?
Network Information:
1. Where is the DB located relative to the Application Server?
2. Where are the users located relative to the Application Server?
3. Is there a firewall, loadbalancer, proxy etc ?
4. How and When the Maintenance tasks are executed? (backup, reindex, update statistics, automatic tasks to clean up temporary folders...)
Has a server rename occurred?
Is your server using IPv6? Are your clients using Ipv6?
Which applications/products/APIs are accessing this Jazz Team Server (JTS) or DB Repository? Requirements Management (RM), Quality Management (QM), CCM, RRDI, Insight, BuildForge, TaskTop, Homemade applications?
What else is running on these environment, sharing the hardware resources?
Physical driver location (Indices location) – NFS, local/shared paths

Understanding the Size and Usage

How many active users? (JTS/Admin – active users)
Are all the users on the same timezone related with the Server?
How many concurrent users? (PMI settings on WebSphere Application Server (WAS))
What are the days and time with more usage?
How many Project Areas?
What is the size of the biggest Project Area? And the smallest one?
What does the user most use? Reporting, Planning, SCM ?
Regarding Check-in&Delivery, What is the size of the biggest stream to be delivered?
What is the size of your DB repository?
The number of Builds, how many build engines, how much source code is involved? Where is the build being executed (network, version)?

Data Collection

Collect the Configuration

Client side

Parameters used to start Eclipse

Server side

JVM parameters used for each application
(WAS) Webcontainer thread pool size for each application : Servers -> Server Types -> WebSphere application server -> [Server Name] -> Thread Pools -> WebContainer.
(WAS) Asynch or Synch mode: Check if the property com.ibm.ws.webcontainer.channelwritetype does exist on Servers -> Server Types -> WebSphere application server -> [Server Name] -> Container Settings -> Web Container Settings -> Web Container -> Custom Properties.
Value of "Maximum Record Count" defined on the Advanced Property for the BIRT Reports

DB connection parameters for each application:

https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.configureDatabaseConnection

IHS/plug-in configuration files

Prepare the Data Collection

Client side

Activate metronome on Eclipse Client
(Firefox) Download and Install Firebug
(Firefox) Download and Install NetExport
(Firefox and Internet Explorer) Download and Install HttpWatch

Server side

Prepare a #TestConnection : Ping and traceroute application x database.
Enable PMI collection data Tuning WebSphere servers for Rational Team Concert performance
Monitoring and Tuning -> Performance Monitoring Infrastructure (PMI) -> [Server Name] -> Enable Performance Monitoring Infrastructure (PMI)

Performance Monitoring Infrastructure (PMI) > server1 > Custom monitoring level:

Threads – Enable ActiveCount
PoolSize Servlet Session manager – Enable ActiveCount and LiveCount

(WAS) Enable the verbose log by adding -verbose:gc on your JVM or by following Enabling verbose garbage collection (verboseGC) in WebSphere Application Server

(Tomcat) Enable the Garbage collection through the following JVM Options for CATALINA_OPTS:

-verbose:gc  -Xloggc:$CATALINA_HOME/logs/gc.log    or   Xloggc:%CATALINA_HOME%/logs/gc.log
-XX:+PrintHeapAtGC  (only if more details about the memory consumed is needed, extra output generated)     
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:-HeapDumpOnOutOfMemoryError

WAIT

Register at WAIT and read the WAIT manual
Download the script on each application machine (jts/ccm/qm)

Out-of-Memory (OOM)

(WAS) Read MustGather: Out of Memory errors with WebSphere Application Server on AIX, Linux, or Windows and choose the better way to collect data. If needed, download and install IBM Support Assistant Data collector on the machines facing the OOM (jts/ccm/qm)

Data Collection during Slow Performance

Server side

Engage Database support team (DB2, Oracle, MS SQL) for locks, contentions, abnormal DB behavior.
Execute the #TestConnection - ping/traceroute from app – database

Collect WAIT Data

./waitDataCollector.sh --sleep 30 --iters 10 --javacoreDir <the full path to the directory where javacores will be written by the JVM>  [PID]

Hang, Crash or High CPU

Windows – Capture screen shots showing the Task Manager (CPU sorted). Also, from the Task Manager/Resource Monitor button, take screen shots of all tabs, expanding the sections.

Server and Client side, if System is Accessible:

How many users are on the system on the moment. It is difficult to answer this specific question, however using PMI monitoring on WebSphere, we can see how many requests on WebSphere are currently being processed.

Execute the #Test steps prepared at the “Characterizing the performance problem”

Gather a screenshot of the following page from the web UI. Ensure you gather a screenshot for each application (example: jts, ccm, jazz, rqm):

https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.activeServices
https://<server>:<port>/<jts|ccm|qm>/service/com.ibm.team.repository.service.internal.counters.ICounterContentService
https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverStatus
https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverDiagnostics
https://<server>:<port>/<jts|ccm|qm>/service/admin?internal=true#action=com.ibm.team.repository.admin.statistics

Client side

Metronome output and test performance during the #Test Steps execution
Collect HTTPWatch output during the #Test steps execution

Data Collection after Slow Performance

Server side

ISALite (all machines JTS/CCM/QM)
IHS and plugin logs: http_plugin.log, access_log, error.log

Statistics: Ensure you gather a screenshot for each application (example: jts, ccm, jazz, rqm).:

https://<hostname>/<context root>/service/admin?internal=true#action=com.ibm.team.repository.admin.statistics

Out-of-Memory (OOM)

Heapdump and core files created
If we have a core file, execute JExtract included with JRE in server/jre/bin
```
#jextract.exe <javacore location>
```

Client side

Eclipse Client log

External links:

Additional contributors: DianeEveritt

Questions and comments:

Deployment

Community information and contribution guidelines

Status icon key:

To do
Under construction
New
Updated
Constant change
None - stable page

Smaller versions of status icons for inline text:

Copyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more here.

RTC Performance Troubleshooting Basics and Data Collection

Performance Troubleshooting Basics

Keys to Understanding Performance Issues

Server versus Client side

Characterizing the Performance Problem

The Duration (How long is it taking?)

Performance issues at the Server and Client side:

Client side

Understanding the Environment

Understanding the Size and Usage

Data Collection

Collect the Configuration

Client side

Server side

Prepare the Data Collection

Client side

Server side

WAIT

Out-of-Memory (OOM)

Data Collection during Slow Performance

Server side

Hang, Crash or High CPU

Server and Client side, if System is Accessible:

Client side

Data Collection after Slow Performance

Server side

Out-of-Memory (OOM)

Client side

Related topics: Initial Troubleshooting Investigation, How to start a troubleshooting assessment, Performance troubleshooting, Browser performance

External links:

Additional contributors: DianeEveritt

Questions and comments: