r15 - 2014-12-05 - 18:55:24 - Main.bsilvermYou are here: TWiki >  Deployment Web > DeploymentMonitoring > CLMServerMonitoringGeneral

new.png Monitoring and Troubleshooting Guide for IBM Rational Collaborative Lifecycle Management (CLM)

Authors: BenSilverman
Build basis: CLM 4.0.x / 5.0

This article is meant to provide a view into the monitoring capabilities available for checking system stability and troubleshooting problems across the CLM application and software stack. A troubleshooting overview is provided in the IBM Rational Collaborative Lifecycle Management Information Center which discusses monitoring and troubleshooting at a high level. See Troubleshooting Jazz Team Server for details.

Server Status Summary

The status summary offers a basic view of the system and application status. To access the Server Status Summary for Change and Configuration Management (CCM), Rational Quality Manager (QM) or Jazz Team Server (JTS), navigate to https://<server>:<port>/<application>/admin > Server > Status Summary. In the case of Rational Requirements Management (RM), this information is provided in the /rmadmin page. The status summary shows the current database status, server uptime, diagnostic warnings, JVM memory usage, version information, license status, VM details, database status, and service error summary.


server-status-summary.jpg

Jazz Team Server Diagnostics

Checking system stability involves a number of factors. IBM provides diagnostics through the Jazz Team Server (jts/admin > Server > Diagnostics) administration panel that will perform a health check on core application functions shown below. Diagnostics can be executed through the web UI or through command line using repotools.

  • System Clock
  • JFS Index
  • JFS Storage
  • Database Indices
  • Background Task
  • Network address resolution
  • Database Connectivity
  • Database Performance Statistics
  • JVM
  • Applications and Friends (Integration check)

The diagnostics can produce a zip file with more verbose details about each diagnostic check and also includes the application logs. In the event of a diagnostic warning, an administrator can check the diagnostic output for warnings that exceeded our benchmark thresholds. For example, a database latency problem would result in a diagnostic failure and verbose output in the com.ibm.team.repository.service.diagnostics.database.internal.databaseStatisticsDiagnostic.html file produced by the diagnostics. The problem could be investigated by a DBA or submitted to IBM Support for further investigation.

IBM Support Assistant Lite (ISALite)

In the event of a diagnostic failure or any unknown application failure requiring investigation, it is recommended to execute the ISALite data collector utility as close to the failure as possible if help from IBM Support is needed to troubleshoot the problem. Doing so will help ensure that the failure information is captured in the data that is collected. The ISALite data collector will collect all application logs and configuration files as well as general system diagnostics (for example: free disk space, free native memory, network checks, environment variables, and more). This information can be analyzed by an administrator or by IBM Support if further assistance is needed.

JVM Memory Usage

Viewing the available memory of the CLM Server is a task that must be done using a utility that inspects the JVM rather than looking at the operating system utilities such as “Task Manager” or “top”. A simple way to check JVM memory usage for a given CLM application is to use the Status Summary panel described in the section titled Server Status Summary. If all applications are running under the same JVM (for example: running on a single Tomcat server or single WebSphere profile), the VM Memory Usage will show the same information regardless of which application Server Status Summary you are looking at.

Refer to the RM Administration section for checking available memory for Requirements Management (RM) if RM is running under a separate JVM.


jvm-memory-usage.jpg

The Maximum Memory Allocation and Current Memory Allocation indicate the amount of memory allocated to the JVM and should read the same value if our recommended JVM sizing recommendations are in place.

The Free Memory indicates the percent of total memory that is not currently used. If this number is reported as critically low, the JVM might be undersized given the amount of activity on the server or there may be a problem that requires investigation. In the event of a memory related problem, the system administrator should run the server diagnostics, collect ISALite data and collect a Java Heap and Thread Dump. Contact IBM Rational Support if assistance is needed to analyze the data collected.

Tivoli Performance Viewer can also be used for real-time monitoring of the JVM’s.

Java Heap and Thread dumps

Heap Dump

A Java Heap Dump captures the state of the objects in memory at the time the dump was taken. In the case of low memory, analyzing a heap dump can be useful to determine potential memory leaks or whether the JVM size needs to be adjusted. The dump can be analyzed using a heap dump analyzer tool such as IBM Heap Dump Analyzer.

A heap dump can be generated in a number of ways depending on the application server being used.

WebSphere

Tomcat

  • A heap dump is generated automatically in the event of a system crash. The default location for a thread dump is in the CLM server directory (for example: C:\IBM\JazzTeamServer\server) and will have a .phd extension.
  • Generated manually by pressing ctrl+pause from the Tomcat console with IBM_HEAPDUMP set as a Tomcat environment variable.
  • Manually taken using RM Administration

Thread Dump

A Java Thread Dump captures the state of the current threads being processed at the time the dump was taken. The dump also displays JVM properties/arguments/environment variables and information about memory allocation (free versus allocated). A thread dump will often be requested by IBM Support when it is necessary to determine what the application was doing during the time of a problem. A thread dump can be analyzed using a tool like IBM Thread Dump Analyzer. Since a single Java thread dump only captures a snapshot of the JVM activity from the precise moment the dump was taken, it may be necessary to take multiple thread dumps periodically to get an idea as to what the application is doing over a period of time. The IBM Whole-system Analysis for Idle Time (WAIT) utility can help generate that data.

WebSphere

Tomcat

  • A thread dump is generated automatically in the event of a system crash. The default location for the thread dump is in the CLM server directory (for example: C:\IBM\JazzTeamServer\server) and will have a .txt extension.
  • Using IBM WAIT
  • Manually taken using RM Administration

IBM Whole-system Analysis of Idle Time (WAIT)

In the event of a problem, IBM support may request a collection of data be taken using IBM Whole-system Analysis of Idle Time (WAIT). The WAIT utility provides a method for automatically generating thread dumps at a set interval over a period of time. WAIT also provides a web interface that shows a high level overview of the data once it has been captured. For details on downloading and using IBM WAIT, visit wait.ibm.com.

Active Services

Active Services is available from the application administration panel (for example: https://<server>:<port>/jts/admin). It displays a list of all CLM services which are currently active along with the total time the service has been running, the user who requested the service, and a java stack trace of the service activity. In the event of an unexpected problem, checking the Active Services can show which services are running at the time of the problem. It is not unusual to see services listed throughout normal usage, however if there are services appearing in the list every time a particular problem is happening, the service activity may need to be investigated by IBM Support.

Note: A long running service does not necessarily indicate a problem. For example, the Extract-Transform-Load (ETL) process may take over an hour to execute if there have been many changes to the data since the last successful ETL.


active-services.jpg

Issued Leases

Issued Leases is available from the application administration panel (for example: https://<server>:<port>/jts/admin). This is where you can check the number of users on the system with active leases. It may be important to note how many users were on the system during the time of a problem. If problems are seen when user load is high, it may be useful to increase the thread pool size to a higher number (suggested to increase 25 at a time).

RM Administration

Administration and monitoring can be performed with RM by using the built in administration panel. This administration console gives the administrator the ability to view a variety of statistics using conditional formatting to highlight numbers out of range according to our benchmarks.

The administration console is accessed using the following URL's depending on the version of the application:

4.0.x: https://<server>:<port>/rm/rmadmin 5.0.x: https://<server>:<port>/rm/admin -> Click the "Debug" tab to access the console

There are many tasks in the RM administration panel that should only be executed under the guidance of IBM Support, however common tasks are as follows:

Logging

Use the Logging section to stream application logs in real-time. Loggers can be configured and updated without requiring a server restart if the administrator needs to view more verbose logging for a particular component. JVM properties can also be accessed at the bottom of the Logging section.

Note: Any changes to the log configuration are lost once the server is restarted. To perminantly change the log configuration, make the changes in the log4j.properties found in the /server/conf/rm directory.

Advanced Properties

Advanced server properties can be updated using this page. Once the server is restarted, these settings are no longer in effect. Only change advanced server properties under guidance of IBM Support.

Debug Service

The Debug Service can be used to manually execute requests to Rational Requirements Composer (RRC) or JTS, for example; viewing the contents of a particular artifact and traversing related resources. In order to enable the debug service, the property com.ibm.rdm.fronting.server.debug.enabled must be set to TRUE in the advanced properties. IBM Support may request the results from a request made through the debug service when troubleshooting data related problems.

SPARQL Query

The SPARQL Query service is available for executing SPARQL queries against the RRC repository or against a particular RRC project. Support may request specific queries be executed to count data or identify artifacts matching certain criteria in the event of a data related problem. For example, to count the number of resources in the RRC repository, the following query can be issued:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rm: <http://www.ibm.com/xmlns/rdm/rdf/>
SELECT (count (distinct *) as ?count)
WHERE {
?resource rdf:type rm:Artifact
}

Module Analysis

Users can perform an analysis on a module to determine whether the module is structurally sound. The analysis will return the overall status on the first line of the output, followed by the result of consistency checks for the following problems:

  • Bindings present in structure but missing from index
  • Bindings present in index but missing from structure
  • Issues with bound artifacts

Note: If problems are encountered during the consistency check, it is advised to contact support with the output of the analysis.

Indexing

Indexing activity is performed when a write is being processed in the repository. The status of the index can be monitored through the /indexing page, or through rm/{rm}admin. The backlog of the indexer should be low on a good performing system. If the backlog of a given index is high (for example: over 1,000), this indicates that there are multiple writes being performed and the indexer has not caught up. Symptoms of heavy indexing can be slow performance, or users not seeing data immediately after creation or modification. See an example below of how the RM query index backlog can be monitored through https://<server>:<port>/{jts(4.x)}|{rm(5.x)}/indexing. In this example, there are no items in the index backlog, which means no items being indexed:


indexing.jpg

For further details about how to tell if a reindex is currently running on your Jazz Team Server and how long it may be expected to run, refer to technote 1662167

In CLM 5.0.1 and higher, you can also use the -verifyJFSIndexes command to validate the current state of the indices.

Maintenance Tasks

Server maintenance is important in order to ensure the highest level of stability and performance.

repotools -deleteJFSResources

This command is recommended for cleaning the database of resources that have been deleted from RRC projects. RRC resources are only “soft-deleted” from the application when a user deletes an artifact. Running this command will help ensure the database maintains an appropriate size by permanently deleting archived resources from existing or archived projects.

Please note that when running this command against non-archived RRC projects, you will lose the ability to traverse artifacts in RRC project baselines that have been deleted as a result of running the command.

repotools –compacttdb

This task is recommended for compacting the size of the indices on the file system. The job can also be scheduled to run at a set interval in versions 5.0 and higher.

Note: Executing the repotools -compacttdb command will require a temporary storage location twice the size of the index directory. Use the tempdir parameter to specify the temporary location.

repotools -reindex

A reindex may be required in the event of an unplanned server outage. If the system is not shut down properly, the indices on the file system can become out of sync with the database which will require a reindex be executed. If you have a recent good backup of your indices, it is recommended you restore the backup and start the server to allow the server to synchronize the indices.

If no backup is available. use the -reindex command and set the scope parameter to “all” (for example: ./repotools-jts.sh –reindex scope=all).

In CLM 5.0.1 and higher, you can also use the -synchronizeJFSIndexes offline to sync the indices after restoring the last good copy.

Database Reorg and Runstats

This is a normal database administration task to be completed by a Database Administrator (DBA). Any time a large amount of data is added to a database, a reorg should be run and statistics (runstats) should be executed against the database tables. Running this command (DB2) will ensure optimal performance can be realized when accessing the database. Comparable commands exist for other supported DBMS vendors (such as Oracle, SQL Server).

DB2: DB2 REORGCHK UPDATE STATISTICS ON TABLE ALL

Online Verify

Online Verify is a utility provided to check the data in the database for inconsistencies. In the event of a database related error (for example: a database error appearing in the application logs), Online Verify can be executed to report potential database integrity issues. Consult IBM Support in the event of an error reported by the Online Verify utility. Each application has a separate verify utility, with the exception of RM. To perform the Online Verify for RM, use the JTS verifier utility provided in the link above.

Application Logs

Application logs contain important information about the CLM server applications. Consult the application logs in the event of an error presented to the user in an applications web interface. If there is an error ID associated with the error message, the administrator can search for that ID in the application log to ensure the appropriate message is being investigated. IBM Support may request log levels be adjusted for certain applications or components by adjusting the log level in the log4j.properties associated with that application or component.

JTS.log (Jazz Team Server)

  • jts-etl.log – Contains logging for Extract Transform Load (ETL) process for JTS and RM

RM.log (Requirements Management (RRC))

  • Converter.log (Converter application)
  • rrdg.log – Contains logging for Rational Reporting for Document Generation (RRDG)

QM.log (Quality Manager (RQM))

  • qm-etl.log – Contains logging for Extract Transform Load (ETL) process for QM

CCM.log (Change Control Management (RTC))

  • ccm-etl.log – Contains logging for Extract Transform Load (ETL) process for CCM
  • admin.log (Lifecycle Project Administration (LPA))

Log file locations

  • Tomcat: \server\logs
  • WebSphere: \AppServer\profiles\AppSrv01\logs

General System Statistics

General system monitoring should not be ignored when checking a system for stability or troubleshooting a problem. Poor network latency, high CPU usage, poor database latency, high disk IO or low disk space will always degrade the performance of any system. The following checks should be considered when evaluating system stability:

  • Ensure latency between the database and application server is less than 2ms

  • Ensure free disk space is sufficient

  • Ensure free native memory is sufficient

  • Ensure CPU utilization is not consistently high

  • Ensure disk IO is not consistently high

Tivoli Performance Viewer

A WebSphere administrator can use the Tivoli Performance Viewer to monitor the health of the application server and the JVM’s. CPU utilization, memory consumption, thread pool usage, heap size and more can be monitored real time to observe performance patterns of the system during peak usage.

The Performance Viewer can be accessed through the WebSphere Administration Console (for example: http://<server>:<port>/ibm/console > Monitoring and Tuning > Performance Viewer > Current Activity > server_name> Performance Modules

If you are observing attributes reaching the thresholds during peak usage, tuning or troubleshooting may be required depending on the situation.


tivoli-performance-viewer.jpg

CLM Performance Health Check Widget

Use the performance health check widget for a high level view of system performance. The health check widget can be added from any CLM dashboard as shown below.


clm-performance-healthcheck-widget.jpg

Once the widget has been added to the dashboard, it can be executed manually to perform a number of connectivity tests critical to the overall performance of the system. See below for an example of a good performance test.


performance-healthcheck.jpg

Related topics:
  • None

External links:

Additional contributors: None

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r15 < r14 < r13 < r12 < r11 | More topic actions
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use. Please read the following disclaimer.
Ideas, requests, problems regarding the Deployment wiki? Create a new task in the RTC Deployment wiki project