E
dit
A
ttach
P
rintable
r10 - 2023-11-24 - 10:30:16 -
SahilMansuri
You are here:
TWiki
>
Deployment Web
>
DeploymentTroubleshooting
>
PerformanceTroubleshooting
>
RTCTroubleshootingBasicsandDataCollection
<div id="header-title" style="padding: 10px 15px; border-width:1px; border-style:solid; border-color:#FFD28C; background-image: url(<nop>https://jazz.net/wiki/pub/Deployment/WebPreferences/TLASE.jpg); background-size: cover; font-size:120%"> ---+!! Engineering Lifecycle Management Performance Troubleshooting Basics and Data Collection <img src="" alt="" width="50" height="50" align="right"> %DKGRAY% Author: [[Main.IsabelMurakami][Isabel Murakami]] <br> Build basis: CLM 4.x - ELM 7.x %ENDCOLOR%</div></sticky> <!-- * Set ALLOWTOPICCHANGE = Main.TWikiAuthorsGroup --> <!-- Page contents top of page on right hand side in box --> <sticky><div style="float:right; border-width:1px; border-style:solid; border-color:#DFDFDF; background-color:#F6F6F6; margin:0 0 15px 15px; padding: 0 15px 0 15px;"> %TOC{title="Page contents"}% </div></sticky> <sticky><div style="margin:15px;"></sticky> This page was initially written for Rational Team Concert. In Engineering Lifecycle Management 7.x, DOORS Next adopted the same architecture as the other applications (RTC->Engineering Workflow Manger, and RQM->Engineering Test Manager), so this article is now in use for all 3 main applications. There is general applicability to the other Java-based applications also. When an user says, "The system is too slow" or "ELM is down", what do they really mean? How do you quantify the user perception of slow? Maybe the user was expecting that a plan with 500 Work Items would be displayed within nearly the same amount of time it takes to open a small one with 50 Work Items. Or maybe just part of a network was down, affecting all users from a single region. The only way to understand *what is* working, *what is not* working, and *how long* it is taking, is by collecting all the data needed to measure how long each kind of access normally takes under normal usage and, also, under high usage. Therefore, when the user says, "It is slow", you can compare and confirm (or not) how much it is slow. Here is a guide for the Engineering Lifecycle Management (ELM) Administrator of the main points to understand about your ELM install and usage, as well as the tools that can be used to collect data *before*, *during*, and *after* the slowness period. ---++ Performance Troubleshooting Basics ---+++ Keys to Understanding Performance Issues * Performance is relative to the user We need to know the expected time under normal conditions and the actual time when experiencing slowness * The business need of the tool What is used more? SCM, Reports, planning * Performance issues can occur because of many factors 1. *Where* is the performance degradation occurring? a. Only one user or all users? (for example, certain users located in a certain building) a. Is the performance degradation across the whole server? 1. *When* is the performance degradation occurring? (all the time, under light load, under heavy load) 1. *What* exactly is slow? SCM, load plans, builds, Reports, Dashboards? Or only certain components? (for example, it is slow loading plans, but work items are fine) 1. Are there any hangs, crashes, Out-of-Memory (OOM), high CPU? ---+++ Server versus Client side When answering the question "where", normally we can detect if the slowness has been caused by a Server side issue (all users affected) or a Client side issue (some users affected). For example, all users connecting via VPN are being affected (the others are working fine), or a single user is not being able to access the application when using Visual Studio (configuration problem on a single machine). <div style="text-align:center;"><img alt="serverVSclient.PNG"src="%ATTACHURL%/serverVSclient.PNG" width="70%"/><br><br> __Figure 1: Server versus Client performance indicators__ </div> ---+++ Characterizing the Performance Problem ---++++ The Duration (How long is it taking?) Collect information on several data points, taking note of the exact date and time, as well as the amount of time it takes to complete the task or activity. Why? To be able to compare the expected time versus the actual time during the slowness, and understand what is slow. * Prepare a #Test using tasks related to the slowest performance activity for the user (build, a specific query, saving Work Item (WI)). Always use the same tasks. * Choose a big Project Area (PA) * Choose a Plan view with several WIs * Choose a word for a quick search * Prepare a Query to run, for instance Open WIs on a selected PA * Check-in & deliver of a sample Project Area * Sample Build ---++++ Performance issues at the Server and Client side: * Gather a screenshot of the following page from the web UI. Ensure you gather a screenshot for each application (jts, ccm, qm/jazz, rm):<verbatim> https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.activeServices https://<server>:<port>/<jts|ccm|qm>/service/com.ibm.team.repository.service.internal.counters.ICounterContentService https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverStatus https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverDiagnostics https://<server>:<port>/<jts|ccm|qm>/service/admin?internal=true#action=com.ibm.team.repository.admin.statistics</verbatim> * How many users are on the system at the time of the slowness? It is difficult to answer this question specifically, however, using PMI monitoring on !WebSphere, we can see how many requests on !WebSphere are currently being processed. ---++++ Client side * Metronome output and test performance during the #Test Steps execution * Collect HTTPWatch output during the #Test steps execution ---+++ Understanding the Environment 1. Operating system of the user and server (RHEL, SUSE, Windows) 1. Database vendor 1. Application Server (Tomcat or !WebSphere) 1. Client type (web or eclipse) and Client version, Web Browser (Firefox, Internet Explorer, Chrome) and Browser version 1. System Specifications: RAM, Heap Size, CPU, Virtualized or Physical machine? 1. Network Information: a. Where is the DB located relative to the Application Server? a. Where are the users located relative to the Application Server? a. Is there a firewall, loadbalancer, proxy? a. How and When are the Maintenance tasks executed? (backup, reindex, update statistics, automatic tasks to clean up temporary folders) 1. Has a server rename occurred? 1. Is your server using IPv6? Are your clients using IPv6? 1. Which applications/products/APIs are accessing this Jazz Team Server (JTS) or DB Repository? Requirements Management (RM), Quality Management (QM), Change and Configuration Management (CCM), Jazz Reporting Service (rb), !BuildForge, !TaskTop, custom applications? 1. What else is running on these environment, sharing the hardware resources? 1. Physical driver location (Indices location) NFS, local/shared paths ---+++ Understanding the Size and Usage 1. How many active users? (JTS/Admin active users) 1. Are all the users in the same timezone as the Server? 1. How many concurrent users? (PMI settings on !WebSphere Application Server (WAS)) 1. What are the days and times with more usage? 1. How many Project Areas? 1. What is the size of the biggest Project Area? And the smallest one? 1. What does the user use the most? Reporting, Planning, SCM? 1. Regarding Check-in&Delivery, What is the size of the biggest stream to be delivered? 1. What is the size of your DB repository? 1. The number of Builds, how many build engines, how much source code is involved? Where is the build being executed (network, version)? ---++ Data Collection ---+++ Collect the Configuration ---++++ Client side * Parameters used to start Eclipse ---++++ Server side * JVM parameters used for each application * (WAS) Webcontainer thread pool size for each application : Servers -> Server Types -> !WebSphere application server -> [Server Name] -> Thread Pools -> !WebContainer. * (WAS) Asynch or Synch mode: Check if the property com.ibm.ws.webcontainer.channelwritetype does exist on Servers -> Server Types -> !WebSphere application server -> [Server Name] -> Container Settings -> Web Container Settings -> Web Container -> Custom Properties. * Value of "Maximum Record Count" defined on the Advanced Property for the BIRT Reports * DB connection parameters for each application:<verbatim> https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.configureDatabaseConnection</verbatim> * IHS/plug-in configuration files ---+++ Prepare the Data Collection ---++++ Client side * Activate [[https://jazz.net/wiki/bin/view/Deployment/JazzMetronomeToolKeepsUsHonest][metronome]] on Eclipse Client * (Firefox) Download and Install [[http://getfirebug.com/downloads][Firebug]] * (Firefox) Download and Install [[http://www.softwareishard.com/blog/netexport/][NetExport]] * (Firefox and Internet Explorer) Download and Install [[https://www.httpwatch.com/download/][HttpWatch]] ---++++ Server side * Prepare a #TestConnection : Ping and traceroute application x database. * Enable PMI collection data [[https://jazz.net/library/article/1430][Tuning WebSphere servers for Rational Team Concert performance]] * Monitoring and Tuning -> Performance Monitoring Infrastructure (PMI) -> [Server Name] -> Enable Performance Monitoring Infrastructure (PMI) * Performance Monitoring Infrastructure (PMI) > server1 > Custom monitoring level:<verbatim> Threads Enable ActiveCount PoolSize Servlet Session manager Enable ActiveCount and LiveCount</verbatim> * (WAS) Enable the verbose log by adding -verbose:gc on your JVM or by following [[http://www-01.ibm.com/support/docview.wss?uid=swg21114927][Enabling verbose garbage collection (verboseGC) in WebSphere Application Server]] * (Tomcat) Enable the Garbage collection through the following JVM Options for CATALINA_OPTS:<verbatim> -verbose:gc -Xloggc:$CATALINA_HOME/logs/gc.log or Xloggc:%CATALINA_HOME%/logs/gc.log -XX:+PrintHeapAtGC (only if more details about the memory consumed is needed, extra output generated) -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-HeapDumpOnOutOfMemoryError</verbatim> ---+++++ WAIT 1. Register at [[https://wait.ibm.com][WAIT]] and read the WAIT manual 1. Download the script on each application machine (jts/ccm/qm/rm) ---++++ Out-of-Memory (OOM) * (WAS) Read [[http://www.ibm.com/support/docview.wss?uid=swg21138587][MustGather: Out of Memory errors with WebSphere Application Server on AIX, Linux, or Windows]] and choose the better way to collect data. If needed, download and install IBM Support Assistant Data collector on the machines facing the OOM (jts/ccm/qm/rm) ---+++ Data Collection during Slow Performance ---++++ Server side * Engage Database support team for locks, contentions, abnormal DB behavior. * Execute the #TestConnection - ping/traceroute from app database * Collect WAIT Data<verbatim> ./waitDataCollector.sh --sleep 30 --iters 10 --javacoreDir <the full path to the directory where javacores will be written by the JVM> [PID]</verbatim> ---+++++ Hang, Crash or High CPU * Windows Capture screen shots showing the Task Manager (CPU sorted). Also, from the Task Manager/Resource Monitor button, take screen shots of all tabs, expanding the sections. ---++++ Server and Client side, if System is Accessible: * Execute the #Test steps prepared at the Characterizing the performance problem * Gather a screenshot of the following page from the web UI. Ensure you gather a screenshot for each application (jts, ccm, qm/jazz, rm):<verbatim> https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.activeServices https://<server>:<port>/<jts|ccm|qm>/service/com.ibm.team.repository.service.internal.counters.ICounterContentService https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverStatus https://<server>:<port>/<jts|ccm|qm>/admin#action=com.ibm.team.repository.admin.serverDiagnostics https://<server>:<port>/<jts|ccm|qm>/service/admin?internal=true#action=com.ibm.team.repository.admin.statistics</verbatim> * How many users are on the system at the time of the slowness? It is difficult to answer this specific question, however using PMI monitoring on !WebSphere, we can see how many requests on !WebSphere are currently being processed. ---++++ Client side * Metronome output and test performance during the #Test Steps execution * Collect HTTPWatch output during the #Test steps execution ---+++ Data Collection after Slow Performance ---++++ Server side * ISALite (run on all machines: jts/ccm/qm/rm) * IHS and plugin logs: http_plugin.log, access_log, error.log * Statistics: Ensure you gather a screenshot for each application (jts, ccm, qm/jazz, rm).: <verbatim> https://<hostname>/<context root>/service/admin?internal=true#action=com.ibm.team.repository.admin.statistics</verbatim> ---++++ Out-of-Memory (OOM) * Heapdump and core files created * If we have a core file, execute JExtract included with JRE in server/jre/bin<verbatim> #jextract.exe <javacore location></verbatim> ---++++ Client side * Eclipse Client log ---+++++!! Related topics: [[InitialTroubleshootingInvestigation][Initial Troubleshooting Investigation]], [[HowToStartATroubleshootingAssessment][How to start a troubleshooting assessment]], [[PerformanceTroubleshooting][Performance troubleshooting]], [[BrowserPerformance][Browser performance]] ---+++++!! External links: * [[https://www.ibm.com][IBM]] * [[https://www.ibm.com/support/pages/mustgather-performance-hang-or-high-cpu-issues-elm-applications?][Support must gather for performance hang or high cpu issues]] ---+++++!! Additional contributors: Main.DianeEveritt ---+++++!! Questions and comments: %COMMENT{type="below" target="RTCTroubleshootingBasicsandDataCollectionComments" button="Submit"}% <sticky></div></sticky>
E
dit
|
A
ttach
|
P
rintable
|
V
iew topic
|
Backlinks:
We
b
,
A
l
l Webs
|
H
istory
: r10
<
r9
<
r8
<
r7
<
r6
|
M
ore topic actions
Deployment
Deployment web
Planning and design
Installing and upgrading
Migrating and evolving
Integrating
Administering
Monitoring
Troubleshooting
Community information and contribution guidelines
Create new topic
Topic list
Search
Advanced search
Notify
RSS
Atom
Changes
Statistics
Web preferences
NOTE: Please use the Sandbox web for testing
Status icon key:
To do
Under construction
New
Updated
Constant change
None - stable page
Smaller versions of status icons for inline text:
Copyright © by IBM and non-IBM contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our
Terms of Use.
Please read the following
disclaimer
.
Dashboards and work items are no longer publicly available, so some links may be invalid. We now provide similar information through other means. Learn more
here
.